Statistica Sinica

Cuiling Wang and Myunghee Cho Paik

Abstract:Various approaches have been developed to deal with missing covariate problems in regression analysis when the data are missing at random. Among them, three main non-likelihood approaches are through weighting, imputation and conditional likelihood. The imputation method replaces the missing contribution to the estimating function with its conditional expectation. The inverse probability weighting method weights each observed record by the inverse of the observation probability. The conditional method constructs an unbiased estimating function using only complete records by modelling the conditional mean, given that record is observed. In the literature, the efficiencies of these methods have been compared via simulation. In this paper we compare the asymptotic variances and prove some inequalities. We show that in logistic regression the asymptotic variance of the conditional likelihood method is smaller than or equal to that of the inverse probability weighting method. When the fully observed variables are categorical, the imputation method is more efficient than the inverse probability weighting method given that the observation model is correctly specified. We also show that if the missing mechanism is MCAR and the true known probability of observation is used, the asymptotic variance of the inverse probability weighting method is greater than or equal to that of the complete case analysis. We also conduct simulation studies to compare performances in finite samples and later illustrate the methods using data from a stroke study.

Key words and phrases:Efficiency, estimating equation, imputation, inverse probability weighting, logistic regression, missing at random, missing covariate.