Statistica Sinica

Andreas Artemiou and Bing Li

Abstract:In this note we give a probabilistic explanation of a phenomenon that is frequently observed but whose reason is not well understood. That is, in a regression setting, the response () is often highly correlated with the leading principal components of the predictor ( ) even though there seems no logical reason for this connection. This phenomenon has long been noticed and discussed in the literature, and has received renewed interest recently because of the need for regressing on of very high dimension, often with comparatively few sampling units, in which case it seems natural to regress on the first few principal components of . This work stems from a discussion of a recent paper by Cook (2007) which, along with other developments, described a historical debate surrounding, and current interest in, this phenomenon.

Key words and phrases:Dimension reduction, orientationally uniform distribution, principal components, random covariance matrices, regression, stochastic ordering.