Abstract

Existing methods for handling nonignorable missing data often rely

on strong modeling assumptions, making them vulnerable to model misspecification. This paper proposes a conformal prediction framework for constructing

prediction sets under nonignorable missing responses, which is model-free for the

outcome regression while relying on a consistently estimated propensity score.

Our framework addresses two central challenges posed by nonignorable missingness: non-identifiability and the lack of data exchangeability. The key idea is to

construct the highest conditional density prediction set using a local subset near

the target point, while correcting for selection bias via modeling the missingness

mechanism. Within this framework, we develop a bias-adjusted semiparametric

method for conditional density estimation, which fits a quantile process to the

observed data and corrects for bias using propensity weights.

This estimator

integrates seamlessly into the conformal framework, allowing our approach to

guarantee not only marginal coverage, but also local and asymptotic conditional

coverage for any new subject, while achieving asymptotically optimal interval

lengths. We demonstrate the validity and efficiency of our procedure through

simulation studies and an application to a real HIV-CD4 dataset.

Information

Preprint No.SS-2025-0156
Manuscript IDSS-2025-0156
Complete AuthorsMenghan Yi, Yingying Zhang, Yanlin Tang, Huixia Judy Wang
Corresponding AuthorsYanlin Tang
Emailsyanlintang2018@163.com

References

  1. Athey, S., J. Tibshirani, and S. Wager (2019). Generalized random forests. The Annals of Statistics 47(2), 1148–1178.
  2. Belloni, A. and V. Chernozhukov (2011). l1-penalized quantile regression in high-dimensional sparse models. The Annals of Statistics, 82–130.
  3. Cand`es, E., L. Lei, and Z. Ren (2023). Conformalized survival analysis. Journal of the Royal Statistical Society Series B: Statistical Methodology 85(1), 24–45.
  4. Chen, W., K.-J. Chun, and R. F. Barber (2018). Discretized conformal prediction for efficient distribution-free inference. Stat 7(1), e173.
  5. Gui, Y., R. Hore, Z. Ren, and R. F. Barber (2024). Conformalized survival analysis with adaptive cut-offs. Biometrika 111(2), 459–477.
  6. Hammer, S. M., D. A. Katzenstein, M. D. Hughes, H. Gundacker, R. T. Schooley, R. H. Haubrich,
  7. W. K. Henry, M. M. Lederman, J. P. Phair, M. Niu, et al. (1996). A trial comparing nucleoside monotherapy with combination therapy in hiv-infected adults with cd4 cell counts from 200 to 500 per cubic millimeter. New England Journal of Medicine 335(15), 1081–1090.
  8. Hogan, J. W. and N. M. Laird (1997). Model-based approaches to analysing incomplete longitudinal and failure time data. Statistics in Medicine 16(3), 259–272.
  9. Izbicki, R., G. Shimizu, and R. B. Stern (2022). Cd-split and hpd-split: Efficient conformal regions in high dimensions. The Journal of Machine Learning Research 23(1), 3772–3803.
  10. Jin, Y., Z. Ren, and E. J. Cand`es (2023). Sensitivity analysis of individual treatment effects: A robust conformal inference approach. Proceedings of the National Academy of Sciences 120(6), e2214889120.
  11. Koenker, R. (2005). Quantile Regression. Cambridge University Press. Lei, J.
  12. (2019). Fast exact conformalization of the lasso using piecewise linear homotopy. Biometrika 106(4), 749–764.
  13. Lei, J., M. G’Sell, A. Rinaldo, R. J. Tibshirani, and L. Wasserman (2018). Distribution-free predictive inference for regression. Journal of the American Statistical Association 113(523), 1094–1111.
  14. Lei, J. and L. Wasserman (2014). Distribution-free prediction bands for non-parametric regression. Journal of the Royal Statistical Society Series B: Statistical Methodology 76(1), 71–96.
  15. Lei, L. and E. J. Cand`es (2021). Conformal inference of counterfactuals and individual treatment effects. Journal of the Royal Statistical Society Series B: Statistical Methodology 83(5), 911–938.
  16. Li, M., Y. Ma, and J. Zhao (2022). Efficient estimation in a partially specified nonignorable propensity score model. Computational Statistics & Data Analysis 174, 107322.
  17. Li, P., J. Qin, and Y. Liu (2023). Instability of inverse probability weighting methods and a remedy for nonignorable missing data. Biometrics 79(4), 3215–3226.
  18. Li, W., W. Miao, and E. Tchetgen Tchetgen (2023). Non-parametric inference about mean functionals of non-ignorable non-response data without identifying the joint distribution. Journal of the Royal Statistical Society Series B: Statistical Methodology 85(3), 913–935.
  19. Little, R. J. and D. B. Rubin (2019). Statistical Analysis With Missing Data, Volume 793. John Wiley & Sons.
  20. Liu, Y., P. Li, and J. Qin (2022). Full-semiparametric-likelihood-based inference for non-ignorable missing data. Statistica Sinica 32(1), 271–292.
  21. Meinshausen, N. (2006). Quantile regression forests. The Journal of Machine Learning Research 7(35), 983–999.
  22. Miao, W., P. Ding, and Z. Geng (2016). Identifiability of normal and normal mixture models with nonignorable missing data. Journal of the American Statistical Association 111(516), 1673–1683.
  23. Miao, W., L. Liu, Y. Li, E. J. Tchetgen Tchetgen, and Z. Geng (2024). Identification and semiparametric efficiency theory of nonignorable missing data with a shadow variable. ACM IMS Journal of Data Science 1(2), 1–23.
  24. Qiu, Z., C. Peng, Y. Tang, and H. J. Wang (2026). Review of recent advances in high-dimensional quantile regression. Wiley Interdisciplinary Reviews: Computational Statistics. To appear.
  25. Robins, J. M. and Y. Ritov (1997). Toward a curse of dimensionality appropriate (coda) asymptotic theory for semi-parametric models. Statistics in Medicine 16(3), 285–319.
  26. Shafer, G. and V. Vovk (2008). A tutorial on conformal prediction. The Journal of Machine Learning Research 9(12), 371–421.
  27. Shao, J. and L. Wang (2016). Semiparametric inverse propensity weighting for nonignorable missing data. Biometrika 103(1), 175–187.
  28. Siddiqui, M. M. (1960). Distribution of quantiles in samples from a bivariate population. Journal of Research of the National Bureau of Standards-B 64, 145–150.
  29. Sun, B., W. Miao, and D. S. Wickramarachchi (2026). On doubly robust estimation with nonignorable missing data using instrumental variables. Statistica Sinica 36(4). To appear.
  30. Tan, K. M., L. Wang, and W.-X. Zhou (2022). High-dimensional quantile regression: Convolution smoothing and concave regularization. Journal of the Royal Statistical Society Series B: Statistical Methodology 84(1), 205–233.
  31. Tian, Q., D. Zeng, and J. Zhao (2025). Identification and efficient estimation in regression analysis with response missing not at random. Statistica Sinica. Forthcoming.
  32. Tibshirani, R. J., R. Foygel Barber, E. Candes, and A. Ramdas (2019). Conformal prediction under covariate shift. Advances in Neural Information Processing Systems 32.
  33. Vovk, V., A. Gammerman, and G. Shafer (2005). Algorithmic learning in a random world, Volume 29. Springer.
  34. Wang, H. J., X. Feng, and C. Dong (2019). Copula-based quantile regression for longitudinal data. Statistica Sinica 29(1), 245–264.
  35. Wang, L., P. Zhao, and J. Shao (2021). Dimension-reduced semiparametric estimation of distribution functions and quantiles with nonignorable nonresponse. Computational Statistics & Data Analysis 156, 107142.
  36. Yin, M., C. Shi, Y. Wang, and D. M. Blei (2024). Conformal sensitivity analysis for individual treatment effects. Journal of the American Statistical Association 119(545), 122–135.
  37. Yu, A., Y. Zhong, X. Feng, and Y. Wei (2023). Quantile regression for nonignorable missing data with its application of analyzing electronic medical records. Biometrics 79(3), 2036–2049.
  38. Yuan, Y. and G. Yin (2010). Bayesian quantile regression for longitudinal studies with nonignorable missing data. Biometrics 66(1), 105–114.
  39. Zhang, L., C. Lin, and Y. Zhou (2018). Generalized method of moments for nonignorable missing data. Statistica Sinica 28(4), 2107–2124.
  40. Zhang, T. and L. Wang (2020). Smoothed empirical likelihood inference and variable selection for quantile regression with nonignorable missing response. Computational Statistics & Data Analysis 144, 106888.
  41. Zhang, Y., C. Shi, and S. Luo (2023). Conformal off-policy prediction. In International Conference on Artificial Intelligence and Statistics, pp. 2751–2768. PMLR.
  42. Zhao, J. and Y. Ma (2022). A versatile estimation procedure without estimating the nonignorable missingness mechanism. Journal of the American Statistical Association 117(540), 1916–1930.
  43. Zhao, J. and J. Shao (2015). Semiparametric pseudo-likelihoods in generalized linear models with nonignorable missing data. Journal of the American Statistical Association 110(512), 1577–1590.
  44. Zhao, P., N. Tang, and H. Zhu (2020). Generalized empirical likelihood inferences for nonsmooth moment functions with nonignorable missing values. Statistica Sinica 30(1), 217–249.
  45. Zhao, P., L. Wang, and J. Shao (2021). Sufficient dimension reduction and instrument search for data with nonignorable nonresponse. Bernoulli 27(2), 930–945.
  46. KLATASDS-MOE, School of Statistics, East China Normal University

Acknowledgments

We sincerely thank the Editor, Associate Editor, and two anonymous reviewers for their constructive comments and helpful suggestions. The re-

search of Tang and Zhang was partially supported by the National Natural Science Foundation of China (grants 12371265 and 12471280), Funda-

mental and Interdisciplinary Disciplines Breakthrough Plan of the Ministry

of Education of China (grant JYB2025XDXM904) and Chern Institute of

Mathematics, Nankai University, and the research of Wang was partially

supported by the National Science Foundation (grant DMS-2436216).

Supplementary Materials

The online Supplementary Materials provided detailed technical proofs,

high-dimensional extensions, and additional numerical experiments.


Supplementary materials are available for download.