Statistica Sinica

Chun-Houh Chen

Abstract:Given a -dimensional proximity matrix , a sequence of correlation matrices, , is iteratively formed from it. Here is the correlation matrix of the original proximity matrix and is the correlation matrix of , . This sequence was first introduced by McQuitty (1968), Breiger, Boorman and Arabie (1975) developed an algorithm, CONCOR, based on their rediscovery of its convergence. The sequence often converges to a matrix whose elements are or . This special pattern of partitions the objects into two disjoint groups and so can be recursively applied to generate a divisive hierarchical clustering tree. While convergence is itself useful, we are more concerned with what happens before convergence. Prior to convergence, we note a rank reduction property with elliptical structure: when the rank of reaches two, the column vectors of fall on an ellipse in a two-dimensional subspace. The unique order of relative positions for the points on the ellipse can be used to solve seriation problems such as the reordering of a Robinson matrix. A software package, Generalized Association Plots (GAP), is developed which utilizes computer graphics to retrieve important information hidden in the data or proximity matrices.

Key words and phrases:Data visualization, divisive clustering tree, latent structure, perfect symmetry, proximity matrices, seriation.