Statistics for cDNA Microarrays – Exercises
These exercises are
designed to give you some practical experience using the BioConductor R package
limmaGUI
to carry out analyses of cDNA (or other two channel) microarray data. They are based on the swirl zebrafish
experiment (more on this below). We
will loosely follow the limmaGUI documentation, which
has a worked example based on this (as well as other) data set.
After the course is
completed, you should turn in a short lab report (up to 6 pages maximum) on
your investigation of the swirl data set.
The goal of the
study is to identify differentially expressed genes between wild type (normal)
and mutants. Write your report for the
lead scientist in the study, giving enough information to make clear your understanding
of the material from the course and sufficient justification for any data
analysis decisions the scientist will need to take. You can turn in your report up to 2 weeks after the end of the
module. Please email your report
(preferably as a pdf file) to Darlene.Goldstein@epfl.ch ;
check ahead of time if you need to use a different format.
Your report will be evaluated taking into account:
overall presentation, statement of background and study objectives, summary of
quality assessment (including supporting graphs), description of statistical
analyses carried out (including description of any models fitted, design
matrix, contrasts if necessary, etc.),
(apparent) correctness of results (including some kind of table giving genes
that are differentially expressed, along with the top 50 genes in any case),
and conclusions. You cannot
get full credit by turning in the html report you can generate with limmaGUI.
Day 1
First, open a web
browser and go to http://bioinf.wehi.edu.au/limmaGUI/
then click on Documentation (with screenshots) -> Swirl
Zebrafish data set. Then, read the
background information at the top of the page.
Next, you should download the swirl data set from http://bioinf.wehi.edu.au/limmaGUI/DataSets.html .
To make it easier to
switch between different web pages, you can open a second browser, then go to http://bioinf.wehi.edu.au/marray/ibc2004/lab1/lab1.html (the Lab 1 page). Scroll down
and read sections 1.3: Details on the files used and the
beginning of 2: Swirl experiment.
When you have finished, you should start R: Start menu -> Programs -> R 2.4.1. Work through section 2.1: Reading the data using limmaGUI. Load limmaGUI in R either from the
Packages menu -> Load package ... -> limmaGUI -> OK, or by typing
library(limmaGUI)
Read in the swirl data (as demonstrated in either the Lab 1 page or in more detail on the limmaGUI documentation page), including the GAL
file, the Targets file and the Spot Types file. Once you have successfully loaded the data, carry out some
exploratory data analyses (section 2.2: Diagnostic plots and normalization using
limmaGUI ). Make sure that you do
these for each of the slides, not just the single slide examples in the Lab 1 or Documentation pages.
You should take time at this stage to assess the plots and make
sure that you understand what information each type of plot is giving you. At the end of this exploratory phase, you
should end up with normalized log ratios for each slide. Do
NOT continue to the section on Computing A Linear Model Fit (Documentation page). It is a good idea to save your work for this
part – see Saving and Exiting at the
end of Lab 1 page section 2.2.
If you have time, you might want to download some of the other
example data sets (see the limmaGUI documentation page) and carry out some
exploratory data analyses on those as well.
You might also want to get a start on your final report.
Day 2
Today, continue the
analysis of the swirl data set by using linear modeling and empirical Bayes
statistics to look for differentially expressed genes.