Statistica Sinica 27 (2017), 243-260
Abstract: In survey sampling, calibration is a popular tool used to make total estimators consistent with known totals of auxiliary variables and to reduce variance. When the number of auxiliary variables is large, calibration on all the variables may lead to estimators of totals whose mean squared error (MSE) is larger than the MSE of the Horvitz-Thompson estimator even if this simple estimator does not take account of the available auxiliary information. We study a new technique based on dimension reduction through principal components that can be useful in this large dimension context. Calibration is performed on the first principal components, which can be viewed as the synthetic variables containing the most important part of the variability of the auxiliary variables. When some auxiliary variables play a more important role than others, the method can be adapted to provide an exact calibration on these variables. Some asymptotic properties are given in which the number of variables is allowed to tend to infinity with the population size. A datadriven selection criterion of the number of principal components ensuring that all the sampling weights remain positive is discussed. The methodology of the paper is illustrated, in a multipurpose context, by an application to the estimation of electricity consumption with the help of 336 auxiliary variables.
Key words and phrases: Dimension reduction, model-assisted estimation, multipurpose surveys, partial calibration, partial least squares, penalized calibration, ridge regression, survey sampling, variance approximation.