Statistica Sinica 34 (2024), 637-655
Esther Eustache*1, Audrey-Anne Vallée2 and Yves Tillé1
Abstract: TThe estimator of a parameter of interest can be affected significantly by missing values, which introduce bias and cause additional variability. Swiss cheese nonresponse, also known as nonmonotone nonresponse, is difficult to deal with, because it occurs when each variable of a survey may contain missing values, but without any particular pattern. To reduce the effects of nonresponses, missing values are usually imputed. However, when several variables of a data set need to be imputed, it can be difficult to preserve the distributions of the variables and the relationships between them. In this paper, we propose a new donor imputation method that generalizes the balanced k-nearest neighbor imputation, and is applicable to any configuration of item nonresponses. This new method uses random imputations by donors and is constructed to meet the following requirements. First, all missing values of a unit should be imputed by the same donor. Next, a unit with missing values should be imputed by a neighboring donor. Last, the donors are selected to satisfy some balancing constraints that allows us to decrease the variance of the estimator. The method is divided into two phases. First, we create a stratification by computing a matrix of imputation probabilities using linear programming. Then, we select donors using these imputation probabilities and balanced stratified sampling.
Key words and phrases: Donor imputation, linear programming, nonmonotone nonresponse, random imputation.