These are R data objects containing breast cancer datasets used in the paper Wirapati et al (2008) Breast Cancer Research 10:R65. Description of the datasets can be done in the paper or the technical reports and its supplementary methods in the 'docs' subdirectory. Files: docs/ bcfsib2007-1.pdf -> tech report (see in particular table 1 and figure 1) supmethods.pdf supresults.pdf wirapati2008breastcancerres10-R65.pdf general.rda This contains 'clin.info': clinical variable description 'gene.info': gene description (symbols, Entrez geneid, description) 'dataset': strings that identify the datasets *.rda These are individual datasets. Each contain a list with names such as EMC, BWH, etc. Each list contains three objects: 'clin' a data.frame of clinical variables, the encoding of variables is described in 'clin.info' above. 'probeset' a data.frame of gene symbol, geneid, and original probeset name 'gex' an array of gene expression matrix (original author's processed values), where the rows are the tumors and the columns are the genes. The value can be indexed by strings used as row keys in 'clin' (tumor id) and 'probeset' (gsymbol), for example EMC$gex["3","ESR1"]