Print this page

Structure validation for modelling purposes (SP 3.1.2)

Project leader: G. Vriend, Radboud University, Nijmegen

3D modelling is crucial for the study of proteins, but very complicated. Homology-modelling is the preferred, but imperfect, approach. Many errors in the details of protein structure can only be detected when structure validation tools are used in combination with experimental data. This project concentrates on the validation of structures solved either by X-ray, or NMR. The validation software will provide automatic feedback to protein structure refinement software so that re-refinement of (old) structures with modern software will be possible in cases where the experimental data is available.

Overview of subprojects and results:

Subproject SP 3.1.2.1

Project leader: G. Vriend, Radboud University, Nijmegen

Introduction and objectives

The function of a protein and the molecular interactions that are needed to perform this function can only be truly understood with a structural, i.e. a three-dimensional, description of protein. In such a description the relative position, the coordinates, of all the atoms in the protein should be known. Unfortunately, these coordinates cannot be measured directly. A structure model must be made based on experimental data from X-ray crystallography or, to a lesser extent, nuclear magnetic resonance (NMR) and electron microscopy (EM). Such an experimental model can then be used in computational studies to model protein-small molecule interactions via drug docking or to model the mechanism of the protein function by molecular dynamics. Experimental structure models are also used to predict the structure of a similar protein, through a process called homology modelling. The quality of experimental structure models is a limiting factor of the quality of the derived computational models and the conclusions drawn from them. Or in more positive terms: better structure models lead to better science.

This project set out to look for new means to validate experimental structure models. The aim was to assess the validity or correctness, but also the value (i.e. how much new knowledge can be derived) of the structure models to ultimately get better computational structure models.
Results
First, the existing validation methods to find features of protein structure that could not yet be validated were examined. For two of these, bound metal ions and carbohydrates, new validation algorithms were developed and used to show the errors in existing structure models in the main repository for experimental structure models, the PDB.
With these and other validation methods in place, ways were sought to not only measure the quality of experimental structure models, but actually to improve it. For existing structure models the PDB_REDO was created, which takes the structure model and the original experimental data and uses the latest computational methods to, fully automatically, re-refine the current model. Important features are an improved description of the concerted movement of protein atoms and a better balance between experimental data and prior knowledge of protein structure. PDB_REDO models had an improved fit with the experimental data as well as better geometric quality. It was shown that the new models provide better templates for homology modelling and better targets for drug docking studies. In the course of the project, all X-ray structure models in the PDB were optimised and made publicly available through the PDB_REDO databank (www.cmbi.ru.nl/pdb_redo). These improved structure models are now used by an increasing number of bioinformaticians, molecular biologists and drug designers.
For new experimental structure models,  a set of validation criteria was put together that should be applied to all models that enter the PDB. The PDB is now implementing recommendations formulated within this project. This will ultimately lead to better structure models in the PDB to serve a wide audience of users.
PDB_REDO is being further develop to allow creation of even better structure models and it is now brought to the X-ray crystallography community so they can benefit from the results of this project.

Prediction of gene function and regulation (SP 3.2.1)

Project leader: C.J.F. ter Braak, Wageningen University and Research Centre

Participants: R. van Ham, Wageningen University and Research Centre; E. Cuppen, Hubrecht Institute, Utrecht

To predict the function and regulation of genes, evidence from various data sources has to be pieced together. The challenge is to do this on a genome-wide scale and with significant sensitivity and specificity. This project distinguishes three research lines:
- A probabilistic framework for gene function prediction, based on Saccharomyces cerevisiae. This framework is being expanded to more complex plant genomes.
- The role of alternative splicing in gene functionalisation in plants, as well as its impact on computational methods of gene function prediction.
- Genomes of animal species are explored for functional elements and genetic variations, concentrating on non-protein coding genomic elements (e.g. promoter elements, but also microRNA-coding sequences).

Overview of subprojects and results:

Subproject SP3.2.1.1

Project leader: C.J.F. ter Braak, Wageningen University and Research Centre

Introduction and objectives

This project builds on a probabilistic framework for gene function prediction. The framework integrates data sources that are both feature-based (for example, does gene X have functional domain A?) and interaction- or network-based (for example, do the proteins derived from a pair of genes interact?). The first type of data is useful for prediction if genes that have domain A predominantly perform a particular function F and the second type of data is useful if two genes that interact often perform a similar function (guilt by association). 

Results

An efficient Bayesian method of computation was developed (nicknamed BMRF for Bayesian Markov Random Fields analysis) to be able to apply the framework for genome-wide gene function prediction in yeast (Saccharomyces cerevisiae) and the model plant Arabidopsis thaliana. The project resulted in a conceptually simple method for gene function prediction that outperformed other recently developed, conceptually complicated methods in Arabidopsis thaliana. Both the predicted gene function for Arabidopsis thaliana and the computer algorithm to produce them are made available on the internet.

Subproject SP3.2.1.3

Project leader: E. Cuppen, Hubrecht Institute, Utrecht

Introduction and objectives

Technologies that allow the generation of complete inventories of genetic variation in genomes are developing very rapidly and soon affordable personal genomes will be reality. While some information for the interpretation of relevant biology like disease and disease-susceptibility can already be derived from this data, the function of most genomic regions, as well as the effects of genetic variation on them, remains to be elucidated.

Results

This project employs high-throughput experimental methods and develops bioinformatics tools and algorithms to reliably detect a class of regulatory genes, the so-called miRNAs in a quantitative genome-wide manner. Furthermore, genome-wide inventories of a relevant class of genetic variation, copy number variants (CNVs), have been generated in a genetic model system in the rat and, which have shown that CNVs do affect gene-expression levels.
Taken together, the research performed in this project has resulted in the identification of novel functional elements in genomes and has demonstrated mechanisms by which genetic variation affects biological characteristics. These results will contribute to our understanding of genomes as well as the interpretation of personal genomes.

Protein knowledge building through comparative genomics and data integration (SP3.2.2)

Project leader: P. M.A. Groenen, Merck / Radboud University, Nijmegen

Participants: R.J. Siezen, Radboud University, Nijmegen / NIZO food research; J. Heringa, Vrije Universiteit, Amsterdam; J. Leunissen, Wageningen University and Research Centre; A.E. Gorbalenya, Leiden University Medical Centre

Currently, the function of only about 50-80% of proteins in each genome is known or predicted. This fraction can be increased by comparative genomics and integration. In this project, a repository is constructed of (the most) accurate sequence similarity information from all fully sequenced genomes. These sequence similarities form the basis of a phylogeny-based protein database. Also, enhanced methods of functional annotation based on sequence homology and non-homology methods are being developed. Finally, a data warehouse is developed for enriched protein information, coupled with improved and robust visualization techniques. This allows sophisticated data mining and knowledge building in the areas of biomedicine and biotechnology.

Overview of subprojects and results:

Subproject SP3.2.2.4

Project leader: R.J. Siezen, Radboud University, Nijmegen /  NIZO food research

Introduction and objectives

The objective of this project is to improve protein functional annotation in genome scale using integrative bioinformatics approaches.

Results

Tools have been constructed to increase the accuracy and sensitivity of large-scale bacterial protein subcellular location (SCL) prediction, which is directly related to the function annotation of the proteins. Furthermore, using bacterial species as target organisms, by extensively applying SCL prediction in combination with various heterogeneous sources of knowledge, with focus on extracellular and surface-associated proteins, this work succeeded in improved bacterial genome annotation. Part of the work was closely related to Lactic Acid Bacteria, which are widely used for food fermentation and have been proved to have probiotic effects. The publicly accessible databases Locatep-DB and LAB-secretome were created to store the research results. These repositories are regularly updated in order to provide steady access to the biology and bioinformatics society.

Subproject SP3.2.2.5

Project leader: J. Heringa, Vrije Universiteit, Amsterdam

Using the notion "structure more conserved than sequence", this project has focused on improving the multiple sequence alignment technique by exploiting various predicted structural entities. A special emphasis was put on multiple alignment of transmembrane sequences, a sequence category that is known to be difficult to align.

Results

The techniques resulting from this research have been implemented in the multiple sequence alignment package PRALINE. Using the optimised multiple alignments, a technique was devised to predict functional-specificity conferring residues within protein families. A new sequence entropy measure was implemented in the method Sequence Harmony, which shows improved prediction coverage and specificity relative to counterparts. Finally, the project focused on protein structure-to-structure comparison, where the influence of structure dynamics and malleability on structure alignment was tested. Protein structural variance was obtained using Molecular Dynamics simulations and alternative structural depositions in the Protein Databank (PDB). The results showed that structural alignment can be dramatically affected by structural variation, which has general implications for structural comparison and associated benchmarking. Multiple sequence alignment and functional specificity prediction tools are indispensible, and in consequence ubiquitous indeed, in biomedical research. All software resulting from this project has been made available to the public via webservers, which are all heavily used.

Phenotyping clustering of multi-factorial diseases (SP 3.3.1)

Project leader: A. de Graaf, TNO / Leiden University

Participants: J. van der Greef, TNO; J. Leunissen, Wageningen University and Research Centre; H. Brunner, University Medical Centre St. Radboud, Nijmegen

Different people react differently to drugs. This simple fact creates a major challenge for drug design. Biology is an integrated system of genetic, protein, metabolite, cellular, and pathway events that are interdependent. These biological elements largely define the phenotype. With advanced bioinformatics tools, interlinked data repositories with species-specific molecular genetic information can be mined, and new phenotype-related cluster strategies can be developed. This generates model descriptions, based on systems biology, and enables personalized medication and nutrition.

Overview of subprojects and results:

Subproject SP3.3.1.3

Project leader: H. Brunner, University Medical Centre St. Radboud, Nijmegen

Introduction and objectives

While progress in medical research has been outstanding in recent years, our biology is still poorly understood. Such complexity is beyond the scope of a single researcher. The human genome project provided the foundation for a revolution in genetics and this work is being accelerated by technologies such as microarrays and deep sequencing. However, these technologies swamp researchers with vast amounts of data that are very challenging to analyse. Key to analysing this data and translating it to better diagnosis and treatment of genetic diseases are bioinformatics studies linking genotype to phenotype.

Results

This project set out to investigate the relationships between disease genes and disease phenotypes with an eye to being better able to identify which genes are involved in which diseases. Based on the concept that similar disease phenotypes are caused by mutations in functionally related genes, functional genomics – protein-protein interactions and mRNA co-expression – was used to identify and prioritize candidate disease genes for genetically heterogeneous diseases. These are diseases that can be caused by mutations in any of several different genes. Given that these mutations lead to the same disease, it is logical to suppose that the genes are involved in the same or similar biological processes. It was found that such genes do indeed have more protein-protein interactions with each other and that they are co-ordinately expressed. In these studies, comparative genomics was also used to augment the human data with data from other species, leveraging evolutionary conservation to improve predictions. The principle behind this is that, if a relationship between genes is conserved in different species over a long evolutionary period, it must be important to the biology of the organisms.
In addition to the genetic level, human diseases were also analysed at the phenotypic level. Despite the ready availability of phenotypic information on genetic diseases, this kind of information has historically been undervalued and there are few databases containing systematic descriptions of disease phenotypes. This is partly due to the high cost of systematically phenotyping patients and the lack of standardization. To aid this process, it was investigated which properties of systematic phenotype descriptions are most relevant to disease biology. Such knowledge can inform phenotyping efforts, enabling them to focus on the most important features. The necessity of systematic phenotyping was demonstrated by showing that under-annotated phenotypes poorly reflect underlying disease biology. It was further demonstrated that for each disease, the most prevalent features are also the most informative.

Subproject SP3.3.1.4

Project leader: A. de Graaf , TNO / Leiden University

Introduction and objectives

Elevated plasma cholesterol is a significant risk factor for cardiovascular disease. Cholesterol in plasma is present in particles called lipoproteins that range from the large, low-density chylomicrons via VLDL, LDL and IDL, to the small and dense HDL particles. While standard clinical methods measure only LDL-cholesterol (LDL-C) and HDL-cholesterol (HDL-C), technical improvements have resulted in analytical methods that can differentiate between as many as 20 different size classes of lipoproteins i.e. cholesterol.
As a result of these developments, much more information can be extracted from the plasma lipoprotein profile that is of potential diagnostic value. However, it becomes increasingly difficult to designate specific cholesterol fractions as being “bad” or “good”.  To help solve this problem, in this project the lipoprotein profile was considered as resulting from different biological processes that are active in the body. These include production of large lipoproteins (VLDL) by the liver, gradual reduction of particle size by loss of fat plus cholesterol (lipolysis) from particles to body tissues and, at the end of the particle life cycle, re-uptake of small particles by liver and other tissues.

Results

A computer model was developed that contains mathematical equations for the three mentioned processes. The model allows simulating the simultaneous effect of the life cycles of thousands of particles of different sizes present in the blood. Using this model, one can now predict which lipoprotein profile results for different relative activities of the processes of lipoprotein production, lipolysis and uptake. The other way round, the computer model also made it possible to predict the activity of the three processes from a given person’s lipoprotein profile.
The computer predictions were first compared to the results of studies in which the activity levels of the processes had actually been determined via complex stable isotope labelling experiments. It turned that there is a good match between the model and the experimental data. It was then demonstrated that the three calculated process activity levels could be used to diagnose various types of dyslipidemia (disturbed plasma lipid levels) in patients.
It is currently tested whether the model can likewise be used to improve the diagnosis of cardiovascular disease, using data from the Framingham Heart Study. Results obtained so far are promising.

Summarizing, the approach of describing and computer modelling of the complex lipoprotein profile using available knowledge on the underlying biological processes has the potential to generate improved diagnostic biomarkers.

Development of generic integration methodology towards a life-sciences problem-solving environment by modelling of data and knowledge (SP3.5.1)

Project leader: T. Breit, University of Amsterdam

Participants: P. Adriaans, University of Amsterdam; J. Kok, Leiden University; F. Verbeek, Leiden University; M. Roos, University of Amsterdam

The overall aim of this project is to develop problem-solving environments (PSEs) for computational life-sciences research. For this, semantically annotated data and biological knowledge was used in computational experiments to develop innovative methods, for example for data analysis or knowledge-discovery. Driven by a particular biological problem related to a test case, some analyses were performed using ad-hoc methods. This also involved research on the application of statistical, model-based, or data mining methods to specific problems. Furthermore, ad-hoc problem solving will provide requirements and raw building blocks for the development of the generic problem-solving environment. By using the results of the information analysis and the lessons learned from the problem driven approach, state-of-the-art methodology was developed and evaluated for the creation of dedicated life-sciences problem-solving environments. These included research on how ontologies could be used to provide a formal and non-ambiguous representation of data for robust and flexible data integration, as well as for the application of knowledge models for computational life-sciences research.

Overview of subprojects and results:

Subproject SP3.5.1.1 & SP 3.5.1.5 (merged)

Project leader: T. Breit, University of Amsterdam

Introduction and objectives

Within this project the venues towards the establishment of functional PSEs (Problem Solving Environments) were studied.

Results

  • Proof-of-principle semantic web approach for data integration (Swedi)
  • Successful integration of R in a Grid environment.
  • The development and opening of the e-BioLab (13 December 2007)
  • Development re-annotation tool to re-define Affymetrix probe sets: CDF-Merger.
  • Workflow for re-annotation of microarray probes in BIOMOBY framework: OligoRAP.
  • Grid-based workflow to discover significantly enriched windows: SigWinDetector.
  • Packaged SigWinDetector in R: SigWinR.
  • Generic problem-solving environment for genomics data from multi-strain prokaryotes plus adapted generic ENSEMBL genome browser: PROGENIUS.
  • Using R in Taverna: Rshell.
  • Active participation in W3C Semantic Web Health Care and LS Interest Group.
  • Established strong collaboration between NBIC and internationally leading e-Science organizations OMII-UK, myGrid.

A semantic-web approach for data integration was developed and tested; workflow-management systems such as Taverna an the concept of web services were evaluated; technologies using community-maintained ontologies such as BIOMOBY were adapted; and de-facto standards such as the Ensembl Genome Browser and the R environment were used. It was found that in the life sciences rich and well-defined frameworks for the definition of metadata are still in their infancy. This applies to the technologies and the way life scientists can make use of them as well as to the quality of the ontologies. As a results, the focus shifted to PSEs designed on de-facto standards such as Ensembl and R. This resulted in highly functional PSEs that are flexible because all complexity is accessible and usable because the user interface can be easily adapted to the intended users. Effectively these PSEs eventually have become 'playgrounds' where the e-bioscientist finds (modelled) data, other biological models, analysis tools, modelling tools, and visualization tools. This enables him/her to do computer-assisted research on complex biological problems to improve his/her knowledge by easy correlating and reviewing (omics) data in various ways, develop new ideas, study well-defined biological problems, or discover fundamental phenomena.

Integrative bioinformatics with data model enabled data analysis: test case industrial microorganisms (SP 3.7.1)

Project leader: R.J. Siezen, Radboud University, Nijmegen / NIZO food research

Participants: A. Smilde, University of Amsterdam; T. Breit, University of Amsterdam; J. B.T.M. Roerdink, University of Groningen; O.P. Kuipers, University of Groningen; B. Poolman, University of Groningen

Microorganisms are widely used as cell factories. To improve these factories, generate new products or enhance their performance, these factories must be studied as an integrated system. In this project, we use '~omics' data for discovery and hypothesis generation in the area of life sciences research based on a microorganism test case. For this, a new strategy is needed to enable integration of heterogeneous models and data, as well as methods for the analysis and visualization of such heterogeneous data. The use of data and knowledge models for data annotation and integration forms the basis for a powerful, robust, and scalable integrative bioinformatics methodology.

Overview of subprojects and results:

Subproject SP3.7.1.1

Project leader: R.J. Siezen, Radboud University, Nijmegen / NIZO food research

Introduction and objectives

This project encompasses a set of bioinformatics studies about genome-scale metabolic networks and its applicability to analyze data from high-throughput techniques. An important preceding step in the application of metabolic networks is its construction. Comparative genomic approaches of sequenced genomes play an important role in the construction process. For that reason a comparative genomic approach is evaluated, which has been used for a novel method to accelerate the construction of metabolic networks. Application of metabolic networks for prediction and analysis of data is also addressed.

Results

A comparative genomics study is presented to evaluate the effect of gene duplication on function prediction by orthology (equivalent genes between species originated from the last common ancestor). The question is raised whether one-to-one orthologs, which are most similar on sequence level, are indeed the most likely functional equivalents when duplicates exist. This is done by analyzing orthologs between pairs of genomes where in one genome the orthologous gene has duplicated after the speciation of the two genomes/species (called inparalogs). Gene neighbourhood conservation (i.e., positioning of genes on the genome) is used as an indicator of functional equivalency. Although the majority of investigated cases show that indeed the most similar orthologs at the sequence level conserve gene neighbourhood, a substantial fraction does not.
The methods developed have subsequently been used to accelerate the reconstruction of genome-scale metabolic networks/models with comparative genomics and manually curated networks. On the basis of reconstructed metabolic network, modelling was employed to study global metabolic function at systems level. An integrative approach is presented addressing the question to what extent transcriptional co-regulation of genes can be explained (predicted) by systems properties of genome-scale metabolic networks. Most studies have addressed the subject of regulation by static graph-theoretical descriptions of metabolic networks. In this project, the metabolic networks of Escherichia coli and Saccharomyces cerevisiae have been modelled and a correlation was found between the type of flux coupling and co-regulation of genes at the level of operon organization, co-expression and transcription factor binding. Moreover, flux coupling and the graph-theoretical measure of shortest path distance were evaluated in the context of co-regulation with the conclusion that flux coupling explains co-regulation better. Furthermore it was demonstrated that the concept of flux coupling can be used to explain specific patterns in functional genomics data and in addition modes of evolution shaping complex systems. It was questioned whether asymmetric relations between reactions are reflected in evolution. For this presence and absence patterns were explored of proteins/reactions across species (also referred to as phylogenetic profiles) and their ancestral states. Moreover, the occurrence was examined of asymmetric relations in gene essentiality (effect of single gene knockouts on fitness/growth) and expression data across environmental conditions.      
Finally, a view on the topic of metabolic adaptation was given. Deciphering the adaptive properties underlying the structure and function of metabolic networks is one of the interests of network biology. Many properties can be inferred from networks, such as the global topology, flux states (distribution) and mutational robustness. Are these properties the result of adaptation, favoured by selection? Evolutionary processes alternative to direct selection on the property under investigation could also play a role, but are often ignored. For example, certain systems-level traits might simply arise as by-products of selection on other traits. Metabolic network properties in the light of adaptations and by-products were reviewed and future strategies to investigate metabolic adaptations are proposed.

Subproject SP3.7.1.3

Project leader: T. Breit, University of Amsterdam

Introduction and objectives

Taking a systems-approach to improve the understanding of a biological system means that heterogeneous data and models need to be integrated and analysed as such. In order to provide a systems-level understanding, techniques must be developed that allow the combination of views from different disciplines and biological levels of abstraction. As part of this project, the application of robust and generic solutions for systems-level data integration is investigated. The modelling of data and knowledge is an important aspect of this approach. International standards are studied, such as biological ontologies e.g. Gene Ontology; the OWL standard (Web Ontology Language); RDF (Resource Description Framework); and XML (eXtended Markup Language). Mapping and annotation of experimental data with ontologies and its subsequent use for integration and analysis is investigated. A component-based approach that applies standards, for instance using web-services, ensures re-usability of models, data, and methods.

Results

  • Adaptation of a computer-engineering model (Arthemis) aimed at the development of embedded systems by Design Space Exploration, to be used in a biological case study.
  • An approach and requirements for the integration of measurement data as part of a semantic framework for computational experimentation
  • An interactive workflow for detecting ridges (regions of increased gene expression) in gene expression profiles (SigWinDetector).
  • Development and application of a methodology for operon prediction in prokaryote genomes based on integration of in silico and wet-lab data.
  • A genome centric database to present the results of the operon findings to biologists.
  • Proof that analyzing prokaryotic expression data on an operon basis results in large differences compared to analysis on a per gene basis in gene set enrichment analyses.

In this project, the potential was studied of using engineering methods that were originally developed for the design of embedded computer systems, to analyze biological cell systems. For embedded systems as well as for biological cell systems, design is a feature that defines their identity. The assembly of different components in designs of both systems can vary widely. In contrast to the biology domain, the computer-engineering domain has the opportunity to quickly evaluate design options and consequences of its systems by methods for computer aided design and in particular design space exploration. It was found that there are enough concrete similarities between the two systems to assume that the engineering methodology from the computer systems domain, and in particular that related to embedded systems, can be applied to the domain of cellular systems. This will help to understand the myriad of different design options cellular systems have

Subproject SP3.7.1.4

Project leader: J.B.T.M. Roerdink, University of Groningen

Introduction and objectives

The goal of this subproject was the development of methods and tools for automatic and interactive visualization of "virtual cell" components, in particular of regulatory networks and metabolic pathways.

Results

Interaction networks in biology are very complex, since interactions take place not only at genomic, proteomic, and metabolomic levels, but also between these levels. To deal with this complexity, a software framework was established that is able to visualize such networks, and that offers interactive exploration to a researcher. As part of this effort, an application called GENeVis was developed, which allows simultaneous visualization of gene networks and gene expression time series data. It has features that were lacking in existing tools, such as mapping of expression value and corresponding confidence value to a single visual attribute, multiple time point visualization, visual comparison of multiple time series, and support for statistical data analysis.  Various interaction mechanisms, such as panning, zooming, highlighting, data selection, tooltips, and subnetwork views support data analysis and exploration.

To enhance the use of GENeVis, SpotXplore was developed, which is a plug-in for Cytoscape –an open source software platform for visualizing complex networks and integrating these with any type of attributed data.

Subproject SP3.7.1.5

Project leader: O.P. Kuipers, University of Groningen

Introduction and objectives

This project set out to determine the genetic network and operon structures of Lactobacillus lactis based on experimental transcriptome data.

Results

Using DNA microarray data, many new and unexpected transcriptional correlations were discovered between the expression profiles of genes and operons. These interactions were annotated using a metabolic network developed in another related project. The research performed in this project will help in the description of known transcriptional interactions and provides interesting directions for further research. The models generated in this project hold especially promising leads for the metabolic engineering of L. lactis. This application has high industrial relevance as L. lactis is not only a workhorse in the dairy industry, but is also of interest for the production of oral vaccines.

Subproject SP3.7.1.6

Project leader: B. Poolman, University of Groningen

Introduction and objectives

Data analysis on protein composition of complex biological samples, for instance to discover new biomarkers for disease, inevitably starts with collecting accurate and complete data sets. Protein composition analysis (proteomics) is nowadays almost always performed by shotgun proteomics using the analytical techniques of liquid chromatography (LC) coupled with tandem mass spectrometry (MS/MS). The LC-MS/MS technique is widely used not only for identification but also for quantification of proteins in complex samples. This project aims to develop methods to improve on a typical mass spectrometer based pipeline (Figure 1).

Results

The approach encompassed investigating and addressing various bottlenecks present in the pipeline and performing statistical analysis on the quantification data collected. This has led to publication of several algorithms and scripts that offer advantages in terms of increasing the amount of valid protein data (Fig 1.1), reliability of data (Fig. 1.2) and interpretation in terms of quantitative differences between samples (Fig. 1.3). These new tools in the toolbox for proteomics analysis help to facilitate proteomics workflows in applied sciences, including medical and industrial use.

Figure 1. A typical LC-MS/MS analysis pipeline is shown. The highlighted steps were explored in this project.