Statistica Sinica 35 (2025), 131-150
Abstract: The problem of missing data is common in longitudinal data analysis and poses methodological challenges in terms of providing unbiased estimation and statistical inference, owing to informative missingness. In such cases, it is crucial to correctly identify and appropriately incorporate the missing mechanism into estimation and inference procedures. Traditional methods, such as the completecase analysis and imputation methods, are designed to deal with missing data under unverifiable assumptions of missing completely at random and missing at random. We focus on identifying and estimating missing parameters under the non-ignorable missing assumption, using refreshment samples from two-wave panel data. Specifically, we propose a full-likelihood approach when a parametric model is specified for the joint distribution of two-wave data. When such a model is unavailable, we propose a semiparametric method to estimate the attrition parameters, with marginal density estimates obtained using an additional refreshment sample. We derive several asymptotic properties of the semiparametric estimators, and demonstrate their numerical performance using simulations. We further propose an inference on bootstrapping, and assess it using simulations. Lastly, a real-data application is provided based on the Netherlands Mobility Panel study.
Key words and phrases: Additive non-ignorable missing, asymptotic normality, kernel density estimator, Netherlands Mobility Panel, wave data.