Statistica Sinica 12(2002), 219-240
STATISTICAL ISSUES IN THE CLUSTERING
OF GENE EXPRESSION DATA
Darlene R. Goldstein, Debashis Ghosh and Erin M. Conlon
University of California, University of Michigan and
Harvard University
Abstract:
This paper illustrates some of the problems which can occur
in any data set when clustering samples of gene expression profiles.
These include a possible high degree of dependence of results on choice of
clustering algorithm,
further dependence of results on the choices of genes and samples to be
included in the clustering (for example, whether or not to include
control samples), and difficulty in assessing the validity of the grouping.
We also demonstrate the use of Cox regression as a tool to identify
genes influencing survival.
Key words and phrases:
Cluster analysis, Cox regression,
microarray experiment, survival analysis, unsupervised learning.