Back To Index Previous Article Next Article Full Text

Statistica Sinica 19 (2009), 1741-1754


I-Ping Tu, Hung Chen and Xin Chen

Academia Sinica, Taiwan, National Taiwan University and UC San Francisco

Abstract: Principal components analysis is perhaps the most widely used method for exploring multivariate data. In this paper, we propose a variability plot composed of measures on the stability of each eigenvector over samples as a data exploration tool. We also show that this variability measure gives a good measure on the intersample variability of eigenvectors through asymptotic analysis. For distinct eigenvalues, the asymptotic behavior of this variability measure is comparable to the size of the asymptotic covariance of the eigenvector in Anderson (1963). Applying this method to a gene expression data set for a gastric cancer study, many hills on the proposed variability plot are observed. We are able to show that each hill groups a set of multiple eigenvalues. When the intersample variability of eigenvectors is considered, the cutoff point on informative eigenvectors should not be on the top of the hill as suggested by the proposed variability plot. We also try the proposed method on functional data analysis through a simulated data set with dimension $p$ greater than sample size $n$. The proposed variability plot is successful at distinguishing the signal components, noise components and zero eigenvalue components.

Key words and phrases: Dimension reduction, eigenvector, functional data analysis, principal components analysis, resampling, the multiplicity of eigenvalues, the scree plot.

Back To Index Previous Article Next Article Full Text