Bioinformatics Core Facility
Swiss Institute of Bioinformatics
Quartier Sorge - Batiment Genopode, CH-1015 Lausanne, Switzerland
Eva Budinska

+41 (0) 788 750 630
+41 (0) 21 692 40 97
Eva.Budinska@isb-sib.ch
MSMAD: a computationally efficient method for analysis of noisy array CGH data.

We have developed a new, computationally highly efficient and at the same time rather simple nonparametric method for breakpoint detection in array CGH data. The method is called MSMAD, what is a shortcut for Median Smoothing Median Absolute Deviation method.

It is based on the assumption of rank order-dependence of copy number changes and the jump character of these changes in the sequence of log2ratios.
The proposed algorithm is as follows:

1) Median smoothing of the data (Eilers & de Menezes, 2005)
2) MAD-based double-step breakpoint detection
3) Merging segments (procedure MergeLevels - Willenbrock & Fridlyand, 2005)

Initial smoothing of array CGH data improves the identification of genomic aberrations. We apply the quantile smoothing (Eilers & de Menezes, 2005) is the first step in our algorithm. Here we fix the quantile to 0.5 (i.e. we apply median smoothing).

Breakpoint detection is a double-step process, combining information from detection on non-smoothed data (to detect regions of small size) and data smoothed with larger smoothing parameter l, to be able to precisely detect larger regions in noisy data. The method was implemented in R and the code with example datasets can be downloaded here.
The R-package is under development.

  • Eilers,P.H.C. & de Menezes,R.X. (2005) Quantile smoothing of array CGH data.Bioinformatics, 21, 1146-1153.
  • Willenbrock,H. & Fridlyand,J. (2005) A comparison study: applying segmentation to array CGH data for downstream analyses. Bioinformatics, 21, 4084-4091.