Statistics for genomic data-analysis
General info
- Date
- 28,28 October and 2,3 November
- Location
- Amsterdam
- Website
- www.epidm.nl
- Keywords
- High dimensional genomic data-Cluster analysis-Binary predictor-R
- Teacher(s)
- Menezes, de Renee Dr.
- W. N. van Wieringen, PhD
- M. A. van de Wiel, PhD
- Contact(s)
- Mw. drs. M.C. Stuij
Description
With the advent of genomic techniques, many molecular aspects of the cell are measured. Typical for data generated with such techniques is the high-dimensionality - for each of a limited number of patients (for instance), thousands of ‘genes’ are measured. This turns the traditional paradigm of statistics, in which a limited number of characteristics of many patients is registered, upside down. Traditional statistical methods may be inappropriate for the analysis of high-dimensional genomic data, which requires new statistical methodology. This is introduced through the analysis of gene expression and DNA copy number data. Central to this are the canonical questions:
• How to design an experiment? The principles of experimental design are illustrated in genomic experiments.
• How to get from raw intensity to biological signal? Several normalization techniques for both single and dual colour arrays are treated.
• How to find differentially expressed genes? The knowledge of hypothesis testing is briefly refreshed. Terminology and problems surrounding simultaneous testing of many hypotheses (genes) are introduced. The analysis of complex designs is introduced using the Bioconductor package Limma.
• How to identify (new) subgroups of samples? This is handled by means of hierarchical clustering methods and principal components analysis.
• How to predict clinical outcome? Several prediction methods and concepts like variable selection, cross-validation and evaluation measures are discussed.
These problems, raised by the high-dimensional character of genomic data for these questions, are treated theoretically as well as in practical settings.
The course focus is on gene expression and DNA copy number arrays, but most methodology applies also to the analysis of data from (among others) microRNA expression and SNP arrays.
Learning objectives
1. The participant realizes that a fruitful experiment requires a good design.
2. The participant is familiar with R and is able to load genomic data into R, install and activate libraries required for the analysis.
3. The participant knows how to normalize gene expression data and DNA copy number data.
4. The participant is capable of analysing a comparative microarray experiment using Limma and identifying differentially expressed genes. Therein he/she knows how to deal with the fact that multiple hypotheses are tested simultaneously.
5. The participant knows how to perform a cluster analysis, judge and present visually its results.
6. The participant is able to build and evaluate a binary predictor correctly.
7. The participant knows the pitfalls of existing analyses and is able to critically judge the statistical analysis of genomic data as performed by others.
Format
This course is intensive and covers the basic concepts and methods required for the analysis of high-dimensional data. Discussed statistical methods will be practiced on experimental microarray data with the statistical software R, which is both open-source and freeware. With every lecture followed by a hands-on session on the computer, students can quickly put theory into a practical context. The lecturers will set aside ample time for questions and answers.
Also, medical papers that use the presented methodology will be discussed. The course is concluded with a final assignment, during which students have the chance of performing a complete analysis using their newly acquired skills.
Targetgroup
The course is tailored for researchers (such as pathologists, psychological biologists, human geneticists, oncologists, neuro-geneticists) whose research involves experiments that generate genomic data.
Back to list

