Back To Index Previous Article Next Article Full Text


Statistica Sinica 15(2005), 19-33





A RAPID METHOD FOR THE COMPARISON

OF CLUSTER ANALYSES


Cavan Reilly, Changchun Wang and Mark Rutherford


University of Minnesota


Abstract: Cluster analysis has become a very popular tool for the exploration of high dimensional data. Dozens of algorithms have been proposed, each with its own merits and shortcomings. It is not known to what extent various methods give the same results, nor is it even clear how to measure how similar is the output of two distinct algorithms. Here we propose a statistic that is designed to measure the ``correlation'' between two clustering methods when applied to a particular data set. In contrast to the Rank index, the most common statistic useed for this purpose, the method is very fast. We provide an algorithm that approximates the statistic and demonstrate two of its possible uses. Finally, we use this statistic to understand the clustering in a data set in the context that motivated this work: analysis of a gene expression experiment.



Key words and phrases: Cluster analysis, Cohen's kappa, Metropolis algorithm, microarray, traveling salesman problem.


Back To Index Previous Article Next Article Full Text