Back To Index Previous Article Next Article Full Text

Statistica Sinica 30 (2020), 1135-1154

GENERALIZED REGRESSION ESTIMATORS WITH
HIGH-DIMENSIONAL COVARIATES
Tram Ta1 , Jun Shao1,2 , Quefeng Li3 and Lei Wang4
1University of Wisconsin, Madison, 2East China Normal University
3University of North Carolina, Chapel Hill and 4Nankai University

Abstract: Data from a large number of covariates with known population totals are frequently observed in survey studies. These auxiliary variables contain valuable information that can be incorporated into an estimation of the population total of a survey variable in order to improve the estimation precision. We consider a generalized regression estimator formulated under a model-assisted framework, in which a regression model is used for the available covariates, and the estimator retains the basic design-based properties. The generalized regression estimator is shown to improve the efficiency of the design-based Horvitz-Thompson estimator when the number of covariates is fixed. We investigate the performance of the generalized regression estimator when the number of covariates p is allowed to diverge as the sample size n increases. We examine two approaches. First, the model parameter is estimated using the weighted least squares method when p < n. Second, the Lasso method is employed when the model parameter is sparse. We show that under an assisted model and certain conditions on the joint distribution of the covariates, as well as the divergence rates of n and p, the generalized regression estimator is asymptotically more efficient than the Horvitz-Thompson estimator, and is robust against a model misspecification. We also study the consistency of the variance estimation for the generalized regression estimator. Our theoretical results are corroborated by simulation studies and an example.

Key words and phrases: Asymptotic efficiency, auxiliary information, high dimension, Lasso, model-assisted, survey sampling.

Back To Index Previous Article Next Article Full Text