Back To Index Previous Article Next Article Full Text


Statistica Sinica 19 (2009), 1557-1565





ON PRINCIPAL COMPONENTS AND

REGRESSION: A STATISTICAL EXPLANATION

OF A NATURAL PHENOMENON


Andreas Artemiou and Bing Li


Pennsylvania State University


Abstract: In this note we give a probabilistic explanation of a phenomenon that is frequently observed but whose reason is not well understood. That is, in a regression setting, the response ($Y$) is often highly correlated with the leading principal components of the predictor ( $\mbox{\boldmath$X$}$) even though there seems no logical reason for this connection. This phenomenon has long been noticed and discussed in the literature, and has received renewed interest recently because of the need for regressing $Y$ on $\mbox{\boldmath$X$}$ of very high dimension, often with comparatively few sampling units, in which case it seems natural to regress on the first few principal components of $\mbox{\boldmath$X$}$. This work stems from a discussion of a recent paper by Cook (2007) which, along with other developments, described a historical debate surrounding, and current interest in, this phenomenon.



Key words and phrases: Dimension reduction, orientationally uniform distribution, principal components, random covariance matrices, regression, stochastic ordering.

Back To Index Previous Article Next Article Full Text