Back To Index Previous Article Next Article Full Text

Statistica Sinica 12(2002), 7-29




Chun-Houh Chen

Academia Sinica, Taipei

Abstract: Given a $p$-dimensional proximity matrix $D_{p\times p}$, a sequence of correlation matrices, ${\bf R}=(R^{(1)},R^{(2)} ,\ldots)$, is iteratively formed from it. Here $R^{(1)}$ is the correlation matrix of the original proximity matrix $D$ and $R^{(n)}$ is the correlation matrix of $R^{(n-1)}$, $n > 1$. This sequence was first introduced by McQuitty (1968), Breiger, Boorman and Arabie (1975) developed an algorithm, CONCOR, based on their rediscovery of its convergence. The sequence ${\bf R}$ often converges to a matrix $R^{(\infty)}$ whose elements are $+1$ or $-1$. This special pattern of $R^{(\infty)}$ partitions the $p$ objects into two disjoint groups and so can be recursively applied to generate a divisive hierarchical clustering tree. While convergence is itself useful, we are more concerned with what happens before convergence. Prior to convergence, we note a rank reduction property with elliptical structure: when the rank of $R^{(n)}$ reaches two, the column vectors of $R^{(n)}$ fall on an ellipse in a two-dimensional subspace. The unique order of relative positions for the $p$ points on the ellipse can be used to solve seriation problems such as the reordering of a Robinson matrix. A software package, Generalized Association Plots (GAP), is developed which utilizes computer graphics to retrieve important information hidden in the data or proximity matrices.

Key words and phrases: Data visualization, divisive clustering tree, latent structure, perfect symmetry, proximity matrices, seriation.

Back To Index Previous Article Next Article Full Text