Statistica Sinica 27 (2017), 1437-1459

ROBUST PRINCIPAL COMPONENT ANALYSIS BASED ON

TRIMMING AROUND AFFINE SUBSPACES

C. Croux^{1}
, L.A. García-Escudero^{2}
, A. Gordaliza^{2}
, C. Ruwet^{3}
and R. San Martín^{2}

Abstract: Principal Component Analysis (PCA) is a widely used technique for reducing dimensionality of multivariate data. The principal component subspace is defined as the affine subspace of a given dimension 𝑑 giving the best fit to the data. PCA suffers from a well-known lack of robustness. As a robust alternative, one can resort to an impartial trimming-based approach and search for the best subsample containing a proportion 1 — α of the observations, with 0 < α < 1, and the best 𝑑-dimensional affine subspace fitting this subsample, yielding the trimmed principal component subspace. A population version is given and existence of solutions to both the sample and population problems are proven. Under mild conditions, the solutions of the sample problem are consistent toward the solutions of the population one. The robustness of the method is studied by proving qualitative robustness, computing the breakdown point, and deriving the influence functions. Furthermore, asymptotic efficiencies at the normal model are derived and finite sample efficiencies are studied by means of a simulation study.

Key words and phrases: Affine subspaces, dimension reduction, multivariate statistics, orthogonal regression, principal components, robustness, trimming.