Computational and statistical issues in co-analyzing
gene expression data from multiple studies and platforms
Recently, expression data from cancer studies have been accumulating
rapidly in public databases. For example, in breast cancer, data from
more than 3000 arrays are available. Co-analyzing them together promises
higher statistical power and more reproducible conclusions. Most
commonly cited problem is the lack of comparability between expression
measures. On the other hand, in classical meta-analysis, the main issue
in multi-cohort analyses is the Simpson's paradox, which
precludes pooling and direct comparison of data across cohorts, even if
the measured variables are comparable. This requirement of stratified
analysis solves the problem of expression measure comparability.
However, many standard microarray analyses, such as clustering,
significant and prediction analysis, need to be redesigned and
reimplemented to incorporate stratified analysis.