Co-analyzing datasets from multiple cancer studies: incorporating hierarchical models into differential expression, prediction and cluster analysis

Publicly available clinical and genomics data are rapidly accumulating and the increased sample sizes promise more stable and consolidated results from genome-wide studies. However, combined analysis are still hampered by incommensurabilities due to disparate measurement platforms, data representation and study designs. Hierarchical sampling models (such as those based on meta-analysis, empirical Bayes or random-effect/random-coefficient models) should naturally be used to account for between-study heterogeneities. Although some solutions for two-sample differential expression problems have been proposed, extensions to other data types, such as survival, are still not clear. Furthermore, existing methods for more complex analysis modes, such as prediction and cluster analysis, assume single-study (one-level sampling) models. I will present a framework for modifying these commonly used "expression analysis workhorses" to accommodate datasets from multiple studies.