Statistical integration of multiple-source high-throughput genomics and phenotypic data
General info
- Date from - to
- 01 May 2010 - 30 Apr 2014
- Project leader(s)
- Menezes, de Renee Dr.
Abstract
Aim of the project:
Develop novel statistical methods to identify candidate genes whose expression
is associated with DNA copy number, sequence and methylation variation and may underlie a clinical trait. We will focus both on cancer genomics and complex diseases.
Key objectives:
- Develop a global test-based model that uses subgroups of markers in both dimensions, e.g. SNPs and gene expressions, to identify regions of association between genotype and phenotype (expression), as well as a method to compare patterns of association from two independent datasets to identify phenotype-specific association signatures.
- Extend the integration model from two to three data sources, since in addition to copy number, other mechanisms are likely to play a role in aberrant expression including DNA methylation and loss-of-heterozygosity.
- Develop methods for classification and prediction using two types of high-throughput data simultaneously, e.g. both expression and copy number.
Approach:
The use of high-density platforms to measure SNPs, copy number and methylation on the same samples is increasing. We have developed a high-throughput, generic approach to test associations between genetic variation and gene expression, representing a direct cellular phenotype, by modelling two types of microarray data using a single regression model. We are able to detect subtle effects of mild copy number alterations by taking into account multiple genes in the altered region using a random-effects model. In this project we will extend the current integration model.

