Print this page

PHASAR/BioMeta - mining metabolite data from literature

General info

Date from - to
01 Nov 2004 - 01 Nov 2008
Project leader(s)
Koster, C.H.A. (Kees) Prof. dr.
Mons, Barend Prof. dr.
Leunissen, Jack Prof. dr.
Evelo, Chris T.A. Dr. ir.
de Jong, Franciska Prof. dr.


Text Mining techniques are becoming increasingly important in bioinformatics. Many of the new developments in Text Mining are being pioneered in the bioinformatics context, rather than in mainstream Information Retrieval. The PHASAR/BioMeta project is highly interdisciplinary in set-up. On the Information Retrieval side, the PHASAR (Phrase-based Accurate Search And Retrieval) Text Mining system is being constructed for the automatic extraction of information from large amounts of literature. The resulting system will be generic in nature and, given suitable thesauri and ontologies, can be applied to other subject areas. 'Metabolites' are selected as a test case, since these are normally only mentioned in passing in articles dealing with other topics, and they occur in such diverse sources that manual extraction is practically impossible. Besides a working literature mining system, a detailed thesaurus of metabolite terminology is constructed, as well as a database of metabolites and their relations.

Link to the end report of this project


  • Cross Language Information Retrieval for Biomedical Literature
  • Assignment of protein function and discovery of novel nucleolar proteins based on automatic analysis of MEDLINE
  • Cross Language Information Retrieval for biomedical literature
  • Lightweight gene name normalization by dictionary lookup
  • Measuring concept relatedness using language models
  • Biomedical cross-language information retrieval
  • Parsimonious concept modeling
  • Pathway enrichment based on text mining and its validation on carotenoid and vitamin A metabolism
  • Evaluation of techniques for increasing recall in a dictionary approach to gene and protein name identification
  • Overview of BioCreative II gene normalization
  • Anni 2.0: a multipurpose text-mining tool for the life sciences
  • MeSH Up: effective MeSH text classification for improved document retrieval
  • Jane: suggesting journals, finding experts
  • The Influence of Basic Tokenization on Biomedical Document Retrieval
  • Parsing the Medical Corpus
Back to list[NBIC:R:SP4.1.1:X]