Jean-Francois Beaumont, David Haziza and Cynthia Bocci (2011). On variance estimation under auxiliary value imputation in sample surveys. Vol. 21, No. 2, 515-537.

Statistica Sinica 21 (2011), 515-537

ON VARIANCE ESTIMATION UNDER AUXILIARY

VALUE IMPUTATION IN SAMPLE SURVEYS

Jean-François Beaumont

, David Haziza

and Cynthia Bocci

Statistics Canada and Université de Montréal

Abstract: We study the problem of variance estimation for a domain total when auxiliary value imputation, sometimes called cold-deck or substitution imputation, has been used to fill in missing data. We consider two approaches to inference which lead to different variance estimators. In the first approach, the validity of an imputation model is required. Our proposed variance estimator is nevertheless robust to misspecification of the second moment of the model. Under this approach, we show the somewhat counter-intuitive result that the total variance of the imputed estimator can be smaller than the sampling variance of the complete-data estimator. We also show that the naïve variance estimator (i.e. the variance estimator obtained by treating the imputed values as observed values) is a consistent estimator of the total variance when the sampling fraction is negligible. In the second approach, the validity of an imputation model is not required but response probabilities need to be estimated. Our mean squared error estimator is obtained using robust estimates of response probabilities and is thus only weakly dependent on modeling assumptions. We also show that both approaches lead to asymptotically equivalent total mean squared errors provided that the imputation model underlying the imputed estimator is correctly specified and the sampling fraction is negligible. Finally, we propose a hybrid variance estimator that can be viewed as a compromise between the two approaches. A simulation study illustrates the robustness of our proposed variance (mean squared error) estimators.

Key words and phrases: Cold-deck imputation, imputation model, nonresponse model, response probability, robust variance estimator, self-efficiency.