NCCR Plant Survival - Statistics for cDNA Microarrays, Exercises

Statistics for cDNA Microarrays – Exercises

These exercises are designed to give you some practical experience using the BioConductor R package limmaGUI to carry out analyses of cDNA (or other two channel) microarray data. They are based on the swirl zebrafish experiment (more on this below). We will loosely follow the limmaGUI documentation, which has a worked example based on this (as well as other) data set.

After the course is completed, you should turn in a short lab report (up to 6 pages maximum) on your investigation of the swirl data set. The goal of the study is to identify differentially expressed genes between wild type (normal) and mutants. Write your report for the lead scientist in the study, giving enough information to make clear your understanding of the material from the course and sufficient justification for any data analysis decisions the scientist will need to take. You can turn in your report up to 2 weeks after the end of the module. Please email your report (preferably as a pdf file) to Darlene.Goldstein@epfl.ch ; check ahead of time if you need to use a different format.

Your report will be evaluated taking into account: overall presentation, statement of background and study objectives, summary of quality assessment (including supporting graphs), description of statistical analyses carried out (including description of any models fitted, design matrix, contrasts if necessary, etc.), (apparent) correctness of results (including some kind of table giving genes that are differentially expressed, along with the top 50 genes in any case), and conclusions. You cannot get full credit by turning in the html report you can generate with limmaGUI.

Day 1

First, open a web browser and go to http://bioinf.wehi.edu.au/limmaGUI/ then click on Documentation (with screenshots) -> Swirl Zebrafish data set. Then, read the background information at the top of the page. Next, you should download the swirl data set from http://bioinf.wehi.edu.au/limmaGUI/DataSets.html .

To make it easier to switch between different web pages, you can open a second browser, then go to http://bioinf.wehi.edu.au/marray/ibc2004/lab1/lab1.html (the Lab 1 page). Scroll down and read sections 1.3: Details on the files used and the beginning of 2: Swirl experiment. When you have finished, you should start R: Start menu -> Programs -> R 2.4.1. Work through section 2.1: Reading the data using limmaGUI. Load limmaGUI in R either from the Packages menu -> Load package ... -> limmaGUI -> OK, or by typing

library(limmaGUI)

Read in the swirl data (as demonstrated in either the Lab 1 page or in more detail on the limmaGUI documentation page), including the GAL file, the Targets file and the Spot Types file. Once you have successfully loaded the data, carry out some exploratory data analyses (section 2.2: Diagnostic plots and normalization using limmaGUI ). Make sure that you do these for each of the slides, not just the single slide examples in the Lab 1 or Documentation pages.

You should take time at this stage to assess the plots and make sure that you understand what information each type of plot is giving you. At the end of this exploratory phase, you should end up with normalized log ratios for each slide. Do NOT continue to the section on Computing A Linear Model Fit (Documentation page). It is a good idea to save your work for this part – see Saving and Exiting at the end of Lab 1 page section 2.2.

If you have time, you might want to download some of the other example data sets (see the limmaGUI documentation page) and carry out some exploratory data analyses on those as well. You might also want to get a start on your final report.

Day 2

Today, continue the analysis of the swirl data set by using linear modeling and empirical Bayes statistics to look for differentially expressed genes.

Start limmaGUI in R. You should be able to load in your saved work from yesterday from the limmaGUI File menu -> Open. Open a web browser and go to the Documentation page (http://bioinf.wehi.edu.au/limmaGUI/ then click on Documentation (with screenshots) -> Swirl Zebrafish data set). Scroll down to Computing A Linear Model Fit and work through the example. When you get to Analyzing The Results Of A Linear ModelFit, make sure to make all of the appropriate plots. You can skip the part about Evaluating R Code. You might find it useful to save your work when you have finished.

You can also work through the section Exporting an HTML Report – it doesn't take very long, just a few minutes. This is a quick, but not very flexible, way to output results (and may not be used as your final report).

Again, if you have time, you can work through some of the other example data sets (these have differing levels of complication), or you can continue with making your final report.

If you are already familiar with R (not covered in this course), you might like to try more flexible data analyses. If you are interested, you can try working through some of the practical exercises available at http://www.isrec.isb-sib.ch/~darlene/gda/sched-tp.html .