Back To Index Previous Article Next Article Full Text


Statistica Sinica 12(2002), 219-240



STATISTICAL ISSUES IN THE CLUSTERING

OF GENE EXPRESSION DATA


Darlene R. Goldstein, Debashis Ghosh and Erin M. Conlon


University of California, University of Michigan and Harvard University


Abstract: This paper illustrates some of the problems which can occur in any data set when clustering samples of gene expression profiles. These include a possible high degree of dependence of results on choice of clustering algorithm, further dependence of results on the choices of genes and samples to be included in the clustering (for example, whether or not to include control samples), and difficulty in assessing the validity of the grouping. We also demonstrate the use of Cox regression as a tool to identify genes influencing survival.



Key words and phrases: Cluster analysis, Cox regression, microarray experiment, survival analysis, unsupervised learning.


Back To Index Previous Article Next Article Full Text