Co-analyzing datasets from multiple cancer studies: incorporating
hierarchical models into differential expression, prediction
and cluster analysis
Publicly available clinical and genomics data are rapidly accumulating and the
increased sample sizes promise more stable and consolidated results from
genome-wide studies. However, combined analysis are still hampered by
incommensurabilities due to disparate measurement platforms, data
representation and study designs. Hierarchical sampling models (such as those
based on meta-analysis, empirical Bayes or random-effect/random-coefficient
models) should naturally be used to account for between-study heterogeneities.
Although some solutions for two-sample differential expression problems have
been proposed, extensions to other data types, such as survival, are still not
clear. Furthermore, existing methods for more complex analysis modes, such as
prediction and cluster analysis, assume single-study (one-level sampling)
models. I will present a framework for modifying these commonly used
"expression analysis workhorses" to accommodate datasets from multiple
studies.