Statistics for Affymetrix GeneChips – Exercises

 

These exercises are designed to give you some practical experience using the BioConductor R package affylmGUI to carry out analyses of data from Affymetrix GeneChips.  They are based on the estrogen experiment (more on this below).  We will loosely follow the affylmGUI documentation, which has a worked example based on this (as well as other) data set.

 

After the course is completed, you should turn in a SHORT lab report (up to 6 pages maximum) on your investigation of the estrogen data set.  The goal of the study is to compare sets of conditions.  Write your report for the lead scientist in the study, giving enough information to make clear your understanding of the material from the course and sufficient justification for any data analysis decisions the scientist will need to take.  You can turn in your report up to 2 weeks after the end of the module.  Please email your report (as a pdf file, or zipped doc file) to Darlene.Goldstein@epfl.ch ;  check ahead of time if you need to use a different format.

 

Your report will be evaluated taking into account:  overall presentation, statement of background and study objectives, summary of quality assessment (including supporting graphs), description of statistical analyses carried out (including description of any models fitted, design matrix, contrasts if necessary, etc.), (apparent) correctness of results (including some kind of table giving genes that are affected by estrogen, along with the top 50 genes in any case), and conclusions.  You cannot get full credit by turning in the html report you can generate with affylmGUI.

 

Day 1

 

Today you will load the data into R and use affylmGUI to explore chip quality and quantify gene expression (obtain normalized log signal values). 

 

First, open a web browser and go to http://bioinf.wehi.edu.au/affylmGUI/ then click on Documentation (with screenshots) -> Estrogen data set.  Then, read the background information at the top of the page.  The Estrogen data set (cel files) should already be available on your computer at C:\Program Files\R\R-2.4.1\library\estrogen\extdata.  Copy the CEL files into your working directory.  Next, you should download the targets file into the same directory as the cel files (your working directory) from http://bioinf.wehi.edu.au/affylmGUI/EstrogenTargets.txt .  You can skip the section REQUIREMENTS FOR RUNNING affylmGUI.

 

When you have finished, you should start R:  Start menu -> Programs -> R 2.4.1.  Load the package affylmGUI in R either from the Packages menu -> Load package ... -> affylmGUI -> OK, or by typing

 

library(affylmGUI)

 

Read in the estrogen data, including the cel files and the Targets file.  Once you have successfully loaded the data, carry out some exploratory data analyses, as demonstrated in the documentation. Make sure that you do these for each of the arrays, not just the single array examples in the Documentation pages.  ONE NOTE:  the Intensity Density Plot option has a bug and probably will not work properly in the version installed in your computer.

 

To look at data quality, you might start with Image Array Plot (in the Plot menu) for each array.  When you are asked, you should choose the option that will display the images in R. 

When you get to Normalization in the Documentation page, you should skip down to the bottom of the page and find the section Probe-Level Linear Model Normalization and Quality Plots.  Here, you can use robust regression to explore chip quality (don't worry that the computer will be a little slower here, this part takes a little time).  You should get an Image Quality Plot for each chip and examine it to get some idea of the overall quality.  Should any of the chips be excluded from further analysis?

 

Once you have decided about chip quality, go back up to the Normalization section, and choose RMA (there are 3 options, not 2 as described in the Documentation).  You don't need to save the values, but it might be useful for practice if you will be analyzing microarray data in the future.  Work through the rest of the Normalization section, including exploratory data analysis (plots).

You should stop when you get to the Linear Model section, that uses material that we will cover tomorrow.

You will probably have time to get a start on your final report.  You can save your work for easy loading tomorrow, see the section Save Your Work near the bottom of the Documentation page.

 

Day 2

 

Today, continue the analysis of the estrogen data set by using linear modeling and empirical Bayes statistics to look for genes affected by estrogen.  You might also explore other specific questions, such as which genes are affected only at one of the two times.

Start affylmGUI in R.  You should be able to load in your saved work from yesterday from the affylmGUI File menu -> Open.  Open a web browser and go to the Documentation page (http://bioinf.wehi.edu.au/affylmGUI/ then click on Documentation (with screenshots) -> Estrogen data set).  Scroll down to the end of the Normalization section to the Linear Model  section and work through the example. 

You might also take a few minutes to see what kind of HTML report can be quickly generated – go to the File menu and choose Export HTML Report.  This is a quick, but not very flexible, way to output results (and may not be used as your final report).

If you are already familiar with R (not covered in this course), you might like to try more flexible data analyses.  If you are interested, you can try working through some of the practical exercises available at http://www.isrec.isb-sib.ch/~darlene/gda/sched-tp.html .