Back To Index Previous Article Next Article Full Text


Statistica Sinica 22 (2012), 933-956





PRINCIPAL COMPONENT ANALYSIS IN VERY

HIGH-DIMENSIONAL SPACES


Young Kyung Lee$^1$, Eun Ryung Lee$^2$ and Byeong U. Park$^2$


$^1$Kangwon National University and $^2$Seoul National University


Abstract: Principal component analysis (PCA) is widely used as a means of dimension reduction for high-dimensional data analysis. A main disadvantage of the standard PCA is that the principal components are typically linear combinations of all variables, which makes the results difficult to interpret. Applying the standard PCA also fails to yield consistent estimators of the loading vectors in very high-dimensional settings where the dimension of the data is comparable to, or even larger than, the sample size. In this paper we propose a modification of the standard PCA that works for such high-dimensional data when the loadings of principal components are sparse. Our method starts with an initial subset selection, and then performs a penalized PCA based on the selected subset. We show that our procedure identifies correctly the sparsity of the loading vectors and enjoys the oracle property, meaning that the resulting estimators of the loading vectors have the same first-order asymptotic properties as the oracle estimators that use knowledge of the indices of the nonzero loadings. Our theory covers a variety of penalty schemes. We also provide some numerical evidence of the proposed method, and illustrate it through gene expression data.



Key words and phrases: Adaptive lasso, eigenvalues, eigenvectors, high-dimensional data, MC penalization, penalized principal component analysis, SCAD, sparsity.

Back To Index Previous Article Next Article Full Text