Cavan Reilly, Changchun Wang and Mark Rutherford (2005). A rapid method for the comparison of cluster analyses. Vol.15, No.1

Statistica Sinica 15(2005), 19-33

A RAPID METHOD FOR THE COMPARISON

OF CLUSTER ANALYSES

Cavan Reilly, Changchun Wang and Mark Rutherford

University of Minnesota

Abstract: Cluster analysis has become a very popular tool for the exploration of high dimensional data. Dozens of algorithms have been proposed, each with its own merits and shortcomings. It is not known to what extent various methods give the same results, nor is it even clear how to measure how similar is the output of two distinct algorithms. Here we propose a statistic that is designed to measure the ``correlation'' between two clustering methods when applied to a particular data set. In contrast to the Rank index, the most common statistic useed for this purpose, the method is very fast. We provide an algorithm that approximates the statistic and demonstrate two of its possible uses. Finally, we use this statistic to understand the clustering in a data set in the context that motivated this work: analysis of a gene expression experiment.

Key words and phrases: Cluster analysis, Cohen's kappa, Metropolis algorithm, microarray, traveling salesman problem.