Development of generic integration methodology towards a life-sciences problem-solving environment by modelling of data and knowledge (SP3.5.1)
Project leader: T. Breit, University of Amsterdam
Participants: P. Adriaans, University of Amsterdam; J. Kok, Leiden University; F. Verbeek, Leiden University; M. Roos, University of Amsterdam
The overall aim of this project is to develop problem-solving environments (PSEs) for computational life-sciences research. For this, semantically annotated data and biological knowledge was used in computational experiments to develop innovative methods, for example for data analysis or knowledge-discovery. Driven by a particular biological problem related to a test case, some analyses were performed using ad-hoc methods. This also involved research on the application of statistical, model-based, or data mining methods to specific problems. Furthermore, ad-hoc problem solving will provide requirements and raw building blocks for the development of the generic problem-solving environment. By using the results of the information analysis and the lessons learned from the problem driven approach, state-of-the-art methodology was developed and evaluated for the creation of dedicated life-sciences problem-solving environments. These included research on how ontologies could be used to provide a formal and non-ambiguous representation of data for robust and flexible data integration, as well as for the application of knowledge models for computational life-sciences research.
Overview of subprojects and results:
Subproject SP3.5.1.1 & SP 3.5.1.5 (merged)
Project leader: T. Breit, University of Amsterdam
Introduction and objectives
Within this project the venues towards the establishment of functional PSEs (Problem Solving Environments) were studied.
Results
- Proof-of-principle semantic web approach for data integration (Swedi)
- Successful integration of R in a Grid environment.
- The development and opening of the e-BioLab (13 December 2007)
- Development re-annotation tool to re-define Affymetrix probe sets: CDF-Merger.
- Workflow for re-annotation of microarray probes in BIOMOBY framework: OligoRAP.
- Grid-based workflow to discover significantly enriched windows: SigWinDetector.
- Packaged SigWinDetector in R: SigWinR.
- Generic problem-solving environment for genomics data from multi-strain prokaryotes plus adapted generic ENSEMBL genome browser: PROGENIUS.
- Using R in Taverna: Rshell.
- Active participation in W3C Semantic Web Health Care and LS Interest Group.
- Established strong collaboration between NBIC and internationally leading e-Science organizations OMII-UK, myGrid.
A semantic-web approach for data integration was developed and tested; workflow-management systems such as Taverna an the concept of web services were evaluated; technologies using community-maintained ontologies such as BIOMOBY were adapted; and de-facto standards such as the Ensembl Genome Browser and the R environment were used. It was found that in the life sciences rich and well-defined frameworks for the definition of metadata are still in their infancy. This applies to the technologies and the way life scientists can make use of them as well as to the quality of the ontologies. As a results, the focus shifted to PSEs designed on de-facto standards such as Ensembl and R. This resulted in highly functional PSEs that are flexible because all complexity is accessible and usable because the user interface can be easily adapted to the intended users. Effectively these PSEs eventually have become 'playgrounds' where the e-bioscientist finds (modelled) data, other biological models, analysis tools, modelling tools, and visualization tools. This enables him/her to do computer-assisted research on complex biological problems to improve his/her knowledge by easy correlating and reviewing (omics) data in various ways, develop new ideas, study well-defined biological problems, or discover fundamental phenomena.

