Back To Index Previous Article Next Article Full Text

Statistica Sinica 26 (2016), 1747-1770

THE STATISTICS AND MATHEMATICS OF HIGH
DIMENSION LOW SAMPLE SIZE ASYMPTOTICS
Dan Shen1, Haipeng Shen2, Hongtu Zhu3 and J. S. Marron3
1University of South Florida, 2University of Hong Kong
and 3University of North Carolina at Chapel Hill

Abstract: The aim of this paper is to establish several theoretical properties of principal component analysis for multiple-component spike covariance models. Our results reveal an asymptotic conical structure in critical sample eigendirections under the spike models with distinguishable (or indistinguishable) eigenvalues, when the sample size and/or the number of variables (or dimension) tend to infinity. The consistency of the sample eigenvectors relative to their population counterparts is determined by the ratio between the dimension and the product of the sample size with the spike size. When this ratio converges to a nonzero constant, the sample eigenvector converges to a cone, with a certain angle to its corresponding population eigenvector. In the High Dimension, Low Sample Size case, the angle between the sample eigenvector and its population counterpart converges to a limiting distribution. Several generalizations of the multi-spike covariance models are explored, and additional theoretical results are presented.

Key words and phrases: Big data, conical behavior, high dimension low sample size, PCA.

Back To Index Previous Article Next Article Full Text