Chun-Houh Chen (2002). Generalized association plots: information visualization via iteratively generated correlation matrices. Vol.12, No.1.

Statistica Sinica 12(2002), 7-29

GENERALIZED ASSOCIATION PLOTS:

INFORMATION VISUALIZATION VIA ITERATIVELY

GENERATED CORRELATION MATRICES

Chun-Houh Chen

Academia Sinica, Taipei

Abstract: Given a -dimensional proximity matrix $D_{p\times p}$ , a sequence of correlation matrices, ${\bf R}=(R^{(1)},R^{(2)} ,\ldots)$ , is iteratively formed from it. Here $R^{(1)}$ is the correlation matrix of the original proximity matrix and $R^{(n)}$ is the correlation matrix of $R^{(n-1)}$ , . This sequence was first introduced by McQuitty (1968), Breiger, Boorman and Arabie (1975) developed an algorithm, CONCOR, based on their rediscovery of its convergence. The sequence ${\bf R}$ often converges to a matrix $R^{(\infty)}$ whose elements are or . This special pattern of $R^{(\infty)}$ partitions the objects into two disjoint groups and so can be recursively applied to generate a divisive hierarchical clustering tree. While convergence is itself useful, we are more concerned with what happens before convergence. Prior to convergence, we note a rank reduction property with elliptical structure: when the rank of $R^{(n)}$ reaches two, the column vectors of $R^{(n)}$ fall on an ellipse in a two-dimensional subspace. The unique order of relative positions for the points on the ellipse can be used to solve seriation problems such as the reordering of a Robinson matrix. A software package, Generalized Association Plots (GAP), is developed which utilizes computer graphics to retrieve important information hidden in the data or proximity matrices.

Key words and phrases: Data visualization, divisive clustering tree, latent structure, perfect symmetry, proximity matrices, seriation.