 , Eun Ryung Lee
, Eun Ryung Lee and Byeong U.
Park
 and Byeong U.
Park
 Kangwon National
University and
Kangwon National
University and  Seoul National University
Seoul National UniversityAbstract: Principal component analysis (PCA) is widely used as a means of dimension reduction for high-dimensional data analysis. A main disadvantage of the standard PCA is that the principal components are typically linear combinations of all variables, which makes the results difficult to interpret. Applying the standard PCA also fails to yield consistent estimators of the loading vectors in very high-dimensional settings where the dimension of the data is comparable to, or even larger than, the sample size. In this paper we propose a modification of the standard PCA that works for such high-dimensional data when the loadings of principal components are sparse. Our method starts with an initial subset selection, and then performs a penalized PCA based on the selected subset. We show that our procedure identifies correctly the sparsity of the loading vectors and enjoys the oracle property, meaning that the resulting estimators of the loading vectors have the same first-order asymptotic properties as the oracle estimators that use knowledge of the indices of the nonzero loadings. Our theory covers a variety of penalty schemes. We also provide some numerical evidence of the proposed method, and illustrate it through gene expression data.
Key words and phrases: Adaptive lasso, eigenvalues, eigenvectors, high-dimensional data, MC penalization, penalized principal component analysis, SCAD, sparsity.