Ingrid L\"{o}nnstedt and Terry Speed (2002). Replicated microarray data. Vol.12, No.1.

Statistica Sinica 12(2002), 31-46

REPLICATED MICROARRAY DATA

Ingrid Lönnstedt and Terry Speed $^\dagger$

Uppsala University, $^\dagger$ University of California, Berkeley and

$^\dagger$ Walter and Eliza Hall Institute

Abstract: cDNA microarrays permit us to study the expression of thousands of genes simultaneously. They are now used in many different contexts to compare mRNA levels between two or more samples of cells. Microarray experiments typically give us expression measurements on a large number of genes, say 10,000-20,000, but with few, if any, replicates for each gene. Traditional methods using means and standard deviations to detect differential expression are not completely satisfactory in this context, and so a different approach seems desirable. In this paper we present an empirical Bayes method for analysing replicated microarray data. Data from all the genes in a replicate set of experiments are combined into estimates of parameters of a prior distribution. These parameter estimates are then combined at the gene level with means and standard deviations to form a statistic which can be used to decide whether differential expression has occurred. The statistic avoids the problems of using averages or -statistics. The method is illustrated using data from an experiment comparing the expression of genes in the livers of SR-BI transgenic mice with that of the corresponding wild-type mice. In addition we present the results of a simulation study estimating the ROC curve of and three other statistics for determining differential expression: the average and two simple modifications of the usual -statistic. was found to be the most powerful of the four, though the margin was not great. The data were simulated to resemble the SR-BI data.

Key words and phrases: cDNA microarray, differential expression, empirical Bayes, replication, ROC curve, t-statistic.