Abstract
Suppose we are interested in the mean of an outcome that is subject to nonignorable
nonresponse. This paper develops new semiparametric estimation methods with instrumental
variables which affect nonresponse, but not the outcome. The proposed estimators remain consistent and asymptotically normal even under partial model misspecifications for two variation
independent nuisance components. We evaluate the performance of the proposed estimators
via a simulation study, and apply them in adjusting for missing data induced by HIV testing refusal in the evaluation of HIV seroprevalence in Mochudi, Botswana, using interviewer
experience as an instrumental variable.
Information
| Preprint No. | SS-2023-0383 |
|---|---|
| Manuscript ID | SS-2023-0383 |
| Complete Authors | Baoluo Sun, Wang Miao, Deshanee S. Wickramarachchi |
| Corresponding Authors | Baoluo Sun |
| Emails | stasb@nus.edu.sg |
References
- Ahn, H. and J. L. Powell (1993). Semiparametric estimation of censored selection models with a nonparametric selection mechanism. J. Economet. 58(1-2), 3–29.
- Bang, H. and J. M. Robins (2005). Doubly robust estimation in missing data and causal inference models. Biometrics 61(4), 962–973.
- B¨arnighausen, T., J. Bor, S. Wandira-Kazibwe, and D. Canning (2011). Correcting hiv prevalence estimates for survey nonparticipation using heckman-type selection models. Epidemiology 22(1), 27–35.
- Bartlett, J. W., J. R. Carpenter, K. Tilling, and S. Vansteelandt (2014). Improving upon the efficiency of complete case analysis when covariates are MNAR. Biostatistics 15(4), 719–730.
- Bickel, P. J., C. A. Klaassen, P. J. Bickel, Y. Ritov, J. Klaassen, J. A. Wellner, and Y. Ritov (1993). Efficient and Adaptive Estimation for Semiparametric Models, Volume 4. Johns Hopkins University Press Baltimore.
- Chen, H., Z. Geng, and X.-H. Zhou (2009). Identifiability and estimation of causal effects in randomized trials with noncompliance and completely nonignorable missing data. Biometrics 65(3), 675–682.
- Chen, H. Y. (2007). A semiparametric odds ratio model for measuring association. Biometrics 63(2), 413–421.
- Chernozhukov, V., D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, and J. Robins (2018, 01). Double/debiased machine learning for treatment and structural parameters. Economet. J. 21(1), C1–C68.
- Chernozhukov, V., J. C. Escanciano, H. Ichimura, W. K. Newey, and J. M. Robins (2022). Locally robust semiparametric estimation. Econometrica 90(4), 1501–1535.
- Crouch, E. A. and D. Spiegelman (1990). The evaluation of integrals of the form R +∞ −∞f(t) exp(−t2)dt: Application to logistic-normal models. J. Am. Statist. Assoc. 85(410), 464–469.
- Das, M., W. K. Newey, and F. Vella (2003). Nonparametric estimation of sample selection models. The Review of Economic Studies 70(1), 33–58.
- d’Haultfoeuille, X. (2010). A new instrumental method for dealing with endogenous selection. J. Economet. 154(1), 1–15.
- H´ajek, J. (1971). Comment on a paper by d. basu. In V. P. Godambe and D. A. Sprott (Eds.), Foundations of statistical inference, pp. 236. Toronto: Holt, Rinehart and Winston.
- Han, P. and L. Wang (2013). Estimation with missing data: beyond double robustness. Biometrika 100(2), 417–430.
- Heckman, J. (1974). Shadow prices, market wages, and labor supply. Econometrica 42(4), 679–694.
- Heckman, J. J. (1979). Sample selection bias as a specification error. Econometrica 47(1), 153–161.
- Kim, J. K. and J. Shao (2021). Statistical Methods for Handling Incomplete Data. Chapman and Hall/CRC.
- Kim, J. K. and C. L. Yu (2011). A semiparametric estimation of mean functionals with nonignorable missing data. J. Am. Statist. Assoc. 106(493), 157–165.
- Lepkowski, J. M., M. P. Couper, and R. M. Groves (2002). Nonresponse in the second wave of longitudinal household surveys, international conference in survey nonresponse. In International conference in survey nonresponse, pp. 259–274. New York: Wiley;.
- Li, W., Y. Gu, and L. Liu (2020). Demystifying a class of multiply robust estimators. Biometrika 107(4), 919–933.
- Li, W., W. Miao, and E. J. Tchetgen Tchetgen (2023). Non-parametric inference about mean functionals of non-ignorable non-response data without identifying the joint distribution. J. R. Stat. Soc. B 85(3), 913–935.
- Liang, K.-Y. and J. Qin (2000). Regression analysis under non-standard situations: a pairwise pseudolikelihood approach. J. R. Stat. Soc. B 62(4), 773–786.
- Malinsky, D., I. Shpitser, and E. J. Tchetgen Tchetgen (2022). Semiparametric inference for nonmonotone missing-not-at-random data: the no self-censoring model. J. Am. Statist. Assoc. 117(539), 1415–1423.
- Manski, C. F. (1990). Nonparametric bounds on treatment effects. The American Economic Review 80(2), 319–323.
- Marden, J. R., L. Wang, E. J. Tchetgen Tchetgen, S. Walter, M. M. Glymour, and K. E. Wirth (2018). Implementation of instrumental variable bounds for data missing not at random. Epidemiology 29(3), 364–368.
- Miao, W., L. Liu, Y. Li, E. J. Tchetgen Tchetgen, and Z. Geng (2024). Identification and semiparametric efficiency theory of nonignorable missing data with a shadow variable. ACM / IMS J. Data Sci. 1(2), 1–23.
- Miao, W. and E. J. Tchetgen Tchetgen (2016). On varieties of doubly robust estimators under missingness not at random with a shadow variable. Biometrika 103(2), 475–482.
- Miao, W. and E. J. Tchetgen Tchetgen (2018). Identification and inference with nonignorable missing covariate data. Statist. Sinica 28(4), 2049–2067.
- Molenberghs, G., G. Fitzmaurice, M. G. Kenward, A. Tsiatis, and G. Verbeke (2014). Handbook of Missing Data Methodology. CRC Press.
- Newey, W. K. (1994). The asymptotic variance of semiparametric estimators. Econometrica: Journal of the Econometric Society 62(6), 1349–1382.
- Newey, W. K. and D. McFadden (1994). Large sample estimation and hypothesis testing. In Handbook of
- Econometrics, Volume 4, pp. 2111–2245. Elsevier.
- Nicoletti, C. and F. Peracchi (2005). Survey response and survey characteristics: microlevel evidence from the European Community Household Panel. J. R. Stat. Soc. A 168(4), 763–781.
- Pfanzagl, J. (1982). Contributions to a General Asymptotic Statistical Theory. Springer.
- Powell, J. L. (1994). Estimation of semiparametric models. In Handbook of Econometrics, Volume 4, pp. 2443–2521. Elsevier.
- Riddles, M. K., J. K. Kim, and J. Im (2016). A propensity-score-adjustment method for nonignorable nonresponse. Journal of Survey Statistics and Methodology 4(2), 215–245.
- Robins, J. M. and Y. Ritov (1997). Toward a curse of dimensionality appropriate (CODA) asymptotic theory for semi-parametric models. Statist. Med. 16(3), 285–319.
- Robins, J. M. and A. Rotnitzky (2001). Comment on the bickel and kwon article,“inference for semiparametric models: Some questions and an answer”. Statist. Sinica 11(4), 920–936.
- Robins, J. M., A. Rotnitzky, and D. O. Scharfstein (2000). Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In M. E. Halloran and D. Berry (Eds.), Statistical Models in Epidemiology, the Environment, and Clinical Trials, New York, NY, pp. 1–94. Springer New York.
- Robins, J. M., A. Rotnitzky, and L. P. Zhao (1994). Estimation of regression coefficients when some regressors are not always observed. J. Am. Statist. Assoc. 89(427), 846–866.
- Rotnitzky, A., J. M. Robins, and D. O. Scharfstein (1998). Semiparametric regression for repeated outcomes with nonignorable nonresponse. J. Am. Statist. Assoc. 93(444), 1321–1339.
- Rubin, D. B. (1976). Inference and missing data. Biometrika 63(3), 581–592.
- Rubin, D. B. and R. J. Little (2019). Statistical Analysis with Missing Data. John Wiley & Sons.
- Scharfstein, D. O., A. Rotnitzky, and J. M. Robins (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models. J. Am. Statist. Assoc. 94(448), 1096–1120.
- Schr¨apler, J.-P. (2004). Respondent behavior in panel studies: A case study for income nonresponse by means of the german socio-economic panel (SOEP). Sociological Methods & Research 33(1), 118–156.
- Seaman, S. R. and S. Vansteelandt (2018). Introduction to double robust methods for incomplete data. Statistical science 33(2), 184–197.
- Shao, J. and L. Wang (2016). Semiparametric inverse propensity weighting for nonignorable missing data. Biometrika 103(1), 175–187.
- Sun, B., L. Liu, W. Miao, K. Wirth, J. Robins, and E. J. Tchetgen Tchetgen (2018). Semiparametric estimation with data missing not at random using an instrumental variable. Statist. Sinica 28(4), 1965–1983.
- Tan, Z. (2006). A distributional approach for causal inference using propensity scores. J. Am. Statist. Assoc. 101(476), 1619–1637.
- Tan, Z. (2010). Bounded, efficient and doubly robust estimation with inverse weighting. Biometrika 97(3), 661–682.
- Tang, G., R. J. Little, and T. E. Raghunathan (2003). Analysis of multivariate missing data with nonignorable nonresponse. Biometrika 90(4), 747–764.
- Tchetgen Tchetgen, E. J., J. M. Robins, and A. Rotnitzky (2010). On doubly robust estimation in a semiparametric odds ratio model. Biometrika 97(1), 171–180.
- Tchetgen Tchetgen, E. J. and K. E. Wirth (2017). A general instrumental variable framework for regression analysis with outcome missing not at random. Biometrics 73(4), 1123–1131.
- Tsiatis, A. (2007). Semiparametric Theory and Missing Data. Springer Science & Business Media.
- van der Laan, M. J. and J. M. Robins (2003). Unified Methods for Censored Longitudinal Data and Causality. Springer.
- Vansteelandt, S., A. Rotnitzky, and J. Robins (2007). Estimation of regression models for the mean of repeated outcomes under nonignorable nonmonotone nonresponse. Biometrika 94(4), 841–860.
- Vermeulen, K. and S. Vansteelandt (2015). Bias-reduced doubly robust estimation. J. Am. Statist. Assoc. 110(511), 1024–1036.
- Wang, S., J. Shao, and J. K. Kim (2014). An instrumental variable approach for identification and estimation with nonignorable nonresponse. Statist. Sinica 24(3), 1097–1116.
- Yang, S., L. Wang, and P. Ding (2019). Causal inference with confounders missing not at random. Biometrika 106(4), 875–888.
- Zhao, J. and Y. Ma (2022). A versatile estimation procedure without estimating the nonignorable missingness mechanism. J. Am. Statist. Assoc. 117(540), 1916–1930.
- Zhao, J. and J. Shao (2015). Semiparametric pseudo-likelihoods in generalized linear models with nonignorable missing data. J. Am. Statist. Assoc. 110(512), 1577–1590.
Acknowledgments
Baoluo Sun’s work is supported by the Ministry of Education, Singapore, under its
Academic Research Fund Tier 1 (A-8000452-00-00). Wang Miao’s research is supported by National Key R&D Program of China (2022YFA1008100). The authors
would like to thank the anonymous referees, an Associate Editor and the Editor for
their constructive comments that led to a much improved paper.
Supplementary Materials
The online Supplementary Material contains proofs of propositions 1 and 2, a further
discussion on local efficiency and additional simulation results.