Protein knowledge building through comparative genomics and data integration (SP3.2.2)
Project leader: P.M.A. Groenen, Merck / Radboud University, Nijmegen
Participants: R.J. Siezen, Radboud University, Nijmegen / NIZO food research; J. Heringa, Vrije Universiteit, Amsterdam; J. Leunissen, Wageningen University and Research Centre; A.E. Gorbalenya, Leiden University Medical Centre
Currently, the function of only about 50-80% of proteins in each genome is known or predicted. This fraction can be increased by comparative genomics and integration. In this project, a repository is constructed of (the most) accurate sequence similarity information from all fully sequenced genomes. These sequence similarities form the basis of a phylogeny-based protein database. Also, enhanced methods of functional annotation based on sequence homology and non-homology methods are being developed. Finally, a data warehouse is developed for enriched protein information, coupled with improved and robust visualization techniques. This allows sophisticated data mining and knowledge building in the areas of biomedicine and biotechnology.
Overview of subprojects and results:
Project leader: R.J. Siezen, Radboud University, Nijmegen / NIZO food research
Introduction and objectives
The objective of this project is to improve protein functional annotation in genome scale using integrative bioinformatics approaches.
Tools have been constructed to increase the accuracy and sensitivity of large-scale bacterial protein subcellular location (SCL) prediction, which is directly related to the function annotation of the proteins. Furthermore, using bacterial species as target organisms, by extensively applying SCL prediction in combination with various heterogeneous sources of knowledge, with focus on extracellular and surface-associated proteins, this work succeeded in improved bacterial genome annotation. Part of the work was closely related to Lactic Acid Bacteria, which are widely used for food fermentation and have been proved to have probiotic effects. The publicly accessible databases Locatep-DB and LAB-secretome were created to store the research results. These repositories are regularly updated in order to provide steady access to the biology and bioinformatics society.
Project leader: J. Heringa, Vrije Universiteit, Amsterdam
Using the notion "structure more conserved than sequence", this project has focused on improving the multiple sequence alignment technique by exploiting various predicted structural entities. A special emphasis was put on multiple alignment of transmembrane sequences, a sequence category that is known to be difficult to align.
The techniques resulting from this research have been implemented in the multiple sequence alignment package PRALINE. Using the optimised multiple alignments, a technique was devised to predict functional-specificity conferring residues within protein families. A new sequence entropy measure was implemented in the method Sequence Harmony, which shows improved prediction coverage and specificity relative to counterparts. Finally, the project focused on protein structure-to-structure comparison, where the influence of structure dynamics and malleability on structure alignment was tested. Protein structural variance was obtained using Molecular Dynamics simulations and alternative structural depositions in the Protein Databank (PDB). The results showed that structural alignment can be dramatically affected by structural variation, which has general implications for structural comparison and associated benchmarking. Multiple sequence alignment and functional specificity prediction tools are indispensible, and in consequence ubiquitous indeed, in biomedical research. All software resulting from this project has been made available to the public via webservers, which are all heavily used.