TP 2: Reading in data from Affy chips and computing RMA

In this TP, you will get some practice using the affy package. This package is used for quantifying gene expression for Affymetrix GeneChips. You will also get a chance to work through the BioConductor vignette. Using vignettes when they are available is a good way to get to learn how to use a package (not all packages have vignettes though). As usual, you should always make sure you read the help documentation for each function you do not already know.

For these exercises, you will work through some of the examples in the affy vignette. Begin by starting R, then load the packages by typing

library(affy)
library(affydata)
data(Dilution)

To start the vignette, type

openVignette()

and select the affy primer. A window will open showing the pdf document. Read the Introduction (Section 1), then skip to Section 4. (If openVignette() doesn't work, you can find the vignette here.)

Dilution data

Work through all of Section 4. As a check, when you type

mean(mm(Dilution) > pm(Dilution))

you should get: [1] 0.2746048

In Section 4.2, make the histogram using all 4 chips (instead of 2 as in the example). It is also nice to increase the width of lines in the histogram density plots for easier viewing. You can do this with the parameter lwd:

hist(Dilution,lwd=2)

You can also choose your own colors (col), line types (lty), etc., to customize the plots. Make sure you know which curve corresponds to which chip.

When you make the boxplots, make the first 2 chips the same color and the second 2 chips the same color (but use a different color from the one you used for the first 2 chips).

It is good to remove unneeded large objects. After you have finished, you can remove Dilution data by typing

rm(Dilution)

Placenta data

For the exam, you will need to know how to get gene expression measures for Affymetrix GeneChips. This is easily done with the affy package. You might remember that there are several ways of doing this, we will use RMA. It might be helpful to go back to the affy vignette and look through the Quick Start section.

Download the data from http://lausanne.isb-sib.ch/~darlene/gda/DAFLcel.zip into your working directory and unzip DAFLcel.zip. You will then have a directory DAFLcel that contains 5 cel files from hybridizations of human placenta RNA. Read the data in as an AffyBatch (remember to read the help files for new functions), and make histograms of the raw signal values. To do this, you can either change directories to DAFLcel (using setwd in R), use the celfile.path argument of ReadAffy, or move the cel files to your current working directory (maybe the easiest). The code below assumes that your working directory contains the cel files:

plac <- ReadAffy()

Explore the structure and class of plac, and find the slot names (ou can learn more about the S4 object-oriented programming used in Bioconductor here):

str(plac)
class(plac)
slotNames(plac)

Look at the phenoData slot, for example. What information gets put here by default?

Do some graphical exploration of the chip data. You should see that one of the distributions looks very different from the other 4. Make histograms and learn how to put a legend on the plot – this should also come in handy for the exam. Look at

?legend

and see how nice you can make your histogram plot look. After you have tried your best you can click here to see how I did it.

You can also explore the chips by making boxplots and images as well. To display all the images at the same time,

par(mfrow=c(3,2),pty="s")
image(plac)

Now convert the probe level measurements into a summary expression measure (RMA) and extract the expression values:

plac.rma <- rma(plac)
plac.exprs <- exprs(plac.rma)

The resulting expression values are an example of preprocessed data that could be analyzed. We won't do this now, but later you will use limma to analyze experiments done with Affymetrix GeneChips.

Report

For practice, you can write a short report on your exploration of the DAFL chips. You should include a small background on the experiment, some graphical explorations, and a description of the expression summary (RMA). This should not be more than 2 pages. Your report should be in pdf format, and can be in either English or French. Please name your report file lastname.pdf (for example, mine would be goldstein.pdf). You can email your report to me (darlene.goldstein at epfl.ch) by 15.00 on Thursday (6 October 2016).