Yee Hwa Yang, \hskip 1pt Matthew J. Callow and Terence P.(2002). Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments Vol.12, No.1.

Statistica Sinica 12(2002), 111-139

STATISTICAL METHODS FOR IDENTIFYING

DIFFERENTIALLY EXPRESSED GENES IN REPLICATED

cDNA MICROARRAY EXPERIMENTS

Sandrine Dudoit

, Yee Hwa Yang

, Matthew J. Callow

and Terence P. Speed $^{1,3}$

University of California, Berkeley, Lawrence Berkeley National Laboratory

and The Walter and Eliza Hall Institute

Abstract: DNA microarrays are a new and promising biotechnology which allows the monitoring of expression levels in cells for thousands of genes simultaneously. The present paper describes statistical methods for the identification of differentially expressed genes in replicated cDNA microarray experiments. Although it is not the main focus of the paper, new methods for the important pre-processing steps of image analysis and normalization are proposed. Given suitably normalized data, the biological question of differential expression is restated as a problem in multiple hypothesis testing: the simultaneous test for each gene of the null hypothesis of no association between the expression levels and responses or covariates of interest. Differentially expressed genes are identified based on adjusted -values for a multiple testing procedure which strongly controls the family-wise Type I error rate and takes into account the dependence structure between the gene expression levels. No specific parametric form is assumed for the distribution of the test statistics and a permutation procedure is used to estimate adjusted -values. Several data displays are suggested for the visual identification of differentially expressed genes and of important features of these genes. The above methods are applied to microarray data from a study of gene expression in the livers of mice with very low HDL cholesterol levels. The genes identified using data from multiple slides are compared to those identified by recently published single-slide methods.

Key words and phrases: Adjusted p-value, differential gene expression, DNA microarray, image analysis, multiple testing, normalization, permutation test.