Statistics for Affymetrix GeneChips – Exercises
These exercises are
designed to give you some practical experience using the BioConductor R package
affylmGUI
to carry out analyses of data from Affymetrix GeneChips. They are based on the estrogen experiment
(more on this below). We will loosely
follow the affylmGUI documentation, which
has a worked example based on this (as well as other) data set.
After the course is
completed, you should turn in a SHORT
lab report (up to 6 pages maximum) on your investigation of the estrogen data
set. The goal of the study is to compare sets of conditions. Write your report for the lead scientist in
the study, giving enough information to make clear your understanding of the
material from the course and sufficient justification for any data analysis
decisions the scientist will need to take.
You can turn in your report up to 2 weeks after the end of the
module. Please email your report (as a
pdf file, or zipped doc file) to Darlene.Goldstein@epfl.ch ;
check ahead of time if you need to use a different format.
Your report will be evaluated taking into account:
overall presentation, statement of background and study objectives, summary of
quality assessment (including supporting graphs), description of statistical
analyses carried out (including description of any models fitted, design
matrix, contrasts if necessary, etc.),
(apparent) correctness of results (including some kind of table giving genes
that are affected by estrogen, along with the top 50 genes in any case), and
conclusions. You cannot
get full credit by turning in the html report you can generate with affylmGUI.
Day 1
Today you will load the
data into R and use affylmGUI to explore chip quality and quantify gene
expression (obtain normalized log signal values).
First, open a web
browser and go to http://bioinf.wehi.edu.au/affylmGUI/
then click on Documentation (with screenshots) ->
Estrogen data set. Then, read the background
information at the top of the page. The
Estrogen data set (cel files) should already be available on your computer at
C:\Program Files\R\R-2.4.1\library\estrogen\extdata. Copy the
CEL files into your working directory. Next,
you should download the targets file into the same directory as the cel files
(your working directory) from http://bioinf.wehi.edu.au/affylmGUI/EstrogenTargets.txt .
You can skip the section REQUIREMENTS FOR RUNNING affylmGUI.
When you have finished, you should start R: Start menu -> Programs -> R
2.4.1. Load the package affylmGUI
in R either from the Packages menu -> Load package ... -> affylmGUI ->
OK, or by typing
library(affylmGUI)
Read in the estrogen data, including the cel files and the Targets
file. Once you have successfully loaded
the data, carry out some exploratory data analyses, as demonstrated in the
documentation. Make sure that you do these for each of the arrays, not just the
single array examples in the Documentation
pages. ONE NOTE: the Intensity
Density Plot option has a bug and probably will not work properly in the
version installed in your computer.
To look at data quality, you might start with Image Array Plot (in the Plot menu) for each array. When you are asked, you should choose the
option that will display the images in R.
When you get to Normalization
in the Documentation page, you should skip
down to the bottom of the page and find the section Probe-Level
Linear Model Normalization and Quality Plots. Here, you can use
robust regression to explore chip quality (don't worry that the computer will
be a little slower here, this part takes a little time). You should get an Image Quality Plot for each chip and examine it to get some idea of
the overall quality. Should any of the
chips be excluded from further analysis?
Once you have decided about chip quality, go back up to the
Normalization section, and choose RMA (there are 3 options, not 2 as described
in the Documentation). You don't need
to save the values, but it might be useful for practice if you will be
analyzing microarray data in the future.
Work through the rest of the Normalization section, including
exploratory data analysis (plots).
You will probably have time to get a start on your final
report. You can save your work for easy
loading tomorrow, see the section Save Your Work near the bottom of the Documentation page.
Day 2
Today, continue the
analysis of the estrogen data set by using linear modeling and empirical Bayes
statistics to look for genes affected by estrogen. You might also explore other specific questions, such as which genes
are affected only at one of the two times.