Print this page

Meaningful variation

NBIC

11 Aug 2011

In the world of functional genomics (e.g. transcriptomics, proteomics, metabolomics), data analysis knows no shades of grey. Either you opt for univariate analysis – in which the focus is on a single gene or metabolite – or you look at all variables in one go, i.e. the multivariate approach. "Univariate analysis is attractive because there are many methods available and the interpretation is relatively easy, but correlation between different biological actors is left out. You loose a lot of valuable information that way", says NBIC faculty member professor Age Smilde of the Biosystems Data Analysis group at the Swammerdam Institute of Life Sciences (University of Amsterdam). "With multivariate analysis, all correlations are included, but here the drawback is that the interpretation becomes difficult. We have therefore come up with a middle course, the simplivariate models."

Fluttering
The leading idea behind the simplivariate approach is that a substantial part of the variation exhibited in the data does not offer any information on the biology of the system. Smilde explains: "Many variables are simply 'fluttering'. They vary, but their variation is not connected to an underlying biological phenomenon that is relevant to the biological research question. This type of variation is what we call non-informative. With our simplivariate approach, we can identify groups of variables that show coherent behaviour, which is an important clue to potentially underlying biological processes. Employing our algorithm is a first step towards tackling a complex dataset in a more efficient manner."

Coherence
To determine what type of behaviour should be considered 'coherent', close involvement of biologists is essential, according to Smilde. "You need continuous discussions with the biologist to learn for example how many biological phenomena they expect to see to get an idea of whether your analysis is making any sense. Also, they need to inform you on how strong a correlation needs to be to be considered relevant. Based on this information, the bioinformaticians can choose the right model to apply. There are several possibilities in our approach." The recently published algorithm (Saccenti et al., PLoS One, 2011) builds on earlier work into simplivariate models. "We have incorporated a new model and both the statistics and the algorithm itself have been improved", says Smilde. Right now, work is ongoing within the Data Support Platform of the Netherlands Metabolomics Centre in collaboration with NBIC, in which Smilde is also involved, to turn the simplivariate algorithm into a robust and user-friendly tool. Smilde expects this new tool to become available within 6 months - 1 year from now.

For those interested in the meantime, check the paper:
Simplivariate models: uncovering the underlying biology in functional genomics data
Saccenti E, Westerhuis JA, Smilde AK, van der Werf MJ, Hageman JA, Hendriks MM
PLoS One 2011;6(60):e20747

 

By: Esther Thole