Abstract
Existing methods for handling nonignorable missing data often rely
on strong modeling assumptions, making them vulnerable to model misspecification. This paper proposes a conformal prediction framework for constructing
prediction sets under nonignorable missing responses, which is model-free for the
outcome regression while relying on a consistently estimated propensity score.
Our framework addresses two central challenges posed by nonignorable missingness: non-identifiability and the lack of data exchangeability. The key idea is to
construct the highest conditional density prediction set using a local subset near
the target point, while correcting for selection bias via modeling the missingness
mechanism. Within this framework, we develop a bias-adjusted semiparametric
method for conditional density estimation, which fits a quantile process to the
observed data and corrects for bias using propensity weights.
This estimator
integrates seamlessly into the conformal framework, allowing our approach to
guarantee not only marginal coverage, but also local and asymptotic conditional
coverage for any new subject, while achieving asymptotically optimal interval
lengths. We demonstrate the validity and efficiency of our procedure through
simulation studies and an application to a real HIV-CD4 dataset.
Information
| Preprint No. | SS-2025-0156 |
|---|---|
| Manuscript ID | SS-2025-0156 |
| Complete Authors | Menghan Yi, Yingying Zhang, Yanlin Tang, Huixia Judy Wang |
| Corresponding Authors | Yanlin Tang |
| Emails | yanlintang2018@163.com |
References
- Athey, S., J. Tibshirani, and S. Wager (2019). Generalized random forests. The Annals of Statistics 47(2), 1148–1178.
- Belloni, A. and V. Chernozhukov (2011). l1-penalized quantile regression in high-dimensional sparse models. The Annals of Statistics, 82–130.
- Cand`es, E., L. Lei, and Z. Ren (2023). Conformalized survival analysis. Journal of the Royal Statistical Society Series B: Statistical Methodology 85(1), 24–45.
- Chen, W., K.-J. Chun, and R. F. Barber (2018). Discretized conformal prediction for efficient distribution-free inference. Stat 7(1), e173.
- Gui, Y., R. Hore, Z. Ren, and R. F. Barber (2024). Conformalized survival analysis with adaptive cut-offs. Biometrika 111(2), 459–477.
- Hammer, S. M., D. A. Katzenstein, M. D. Hughes, H. Gundacker, R. T. Schooley, R. H. Haubrich,
- W. K. Henry, M. M. Lederman, J. P. Phair, M. Niu, et al. (1996). A trial comparing nucleoside monotherapy with combination therapy in hiv-infected adults with cd4 cell counts from 200 to 500 per cubic millimeter. New England Journal of Medicine 335(15), 1081–1090.
- Hogan, J. W. and N. M. Laird (1997). Model-based approaches to analysing incomplete longitudinal and failure time data. Statistics in Medicine 16(3), 259–272.
- Izbicki, R., G. Shimizu, and R. B. Stern (2022). Cd-split and hpd-split: Efficient conformal regions in high dimensions. The Journal of Machine Learning Research 23(1), 3772–3803.
- Jin, Y., Z. Ren, and E. J. Cand`es (2023). Sensitivity analysis of individual treatment effects: A robust conformal inference approach. Proceedings of the National Academy of Sciences 120(6), e2214889120.
- Koenker, R. (2005). Quantile Regression. Cambridge University Press. Lei, J.
- (2019). Fast exact conformalization of the lasso using piecewise linear homotopy. Biometrika 106(4), 749–764.
- Lei, J., M. G’Sell, A. Rinaldo, R. J. Tibshirani, and L. Wasserman (2018). Distribution-free predictive inference for regression. Journal of the American Statistical Association 113(523), 1094–1111.
- Lei, J. and L. Wasserman (2014). Distribution-free prediction bands for non-parametric regression. Journal of the Royal Statistical Society Series B: Statistical Methodology 76(1), 71–96.
- Lei, L. and E. J. Cand`es (2021). Conformal inference of counterfactuals and individual treatment effects. Journal of the Royal Statistical Society Series B: Statistical Methodology 83(5), 911–938.
- Li, M., Y. Ma, and J. Zhao (2022). Efficient estimation in a partially specified nonignorable propensity score model. Computational Statistics & Data Analysis 174, 107322.
- Li, P., J. Qin, and Y. Liu (2023). Instability of inverse probability weighting methods and a remedy for nonignorable missing data. Biometrics 79(4), 3215–3226.
- Li, W., W. Miao, and E. Tchetgen Tchetgen (2023). Non-parametric inference about mean functionals of non-ignorable non-response data without identifying the joint distribution. Journal of the Royal Statistical Society Series B: Statistical Methodology 85(3), 913–935.
- Little, R. J. and D. B. Rubin (2019). Statistical Analysis With Missing Data, Volume 793. John Wiley & Sons.
- Liu, Y., P. Li, and J. Qin (2022). Full-semiparametric-likelihood-based inference for non-ignorable missing data. Statistica Sinica 32(1), 271–292.
- Meinshausen, N. (2006). Quantile regression forests. The Journal of Machine Learning Research 7(35), 983–999.
- Miao, W., P. Ding, and Z. Geng (2016). Identifiability of normal and normal mixture models with nonignorable missing data. Journal of the American Statistical Association 111(516), 1673–1683.
- Miao, W., L. Liu, Y. Li, E. J. Tchetgen Tchetgen, and Z. Geng (2024). Identification and semiparametric efficiency theory of nonignorable missing data with a shadow variable. ACM IMS Journal of Data Science 1(2), 1–23.
- Qiu, Z., C. Peng, Y. Tang, and H. J. Wang (2026). Review of recent advances in high-dimensional quantile regression. Wiley Interdisciplinary Reviews: Computational Statistics. To appear.
- Robins, J. M. and Y. Ritov (1997). Toward a curse of dimensionality appropriate (coda) asymptotic theory for semi-parametric models. Statistics in Medicine 16(3), 285–319.
- Shafer, G. and V. Vovk (2008). A tutorial on conformal prediction. The Journal of Machine Learning Research 9(12), 371–421.
- Shao, J. and L. Wang (2016). Semiparametric inverse propensity weighting for nonignorable missing data. Biometrika 103(1), 175–187.
- Siddiqui, M. M. (1960). Distribution of quantiles in samples from a bivariate population. Journal of Research of the National Bureau of Standards-B 64, 145–150.
- Sun, B., W. Miao, and D. S. Wickramarachchi (2026). On doubly robust estimation with nonignorable missing data using instrumental variables. Statistica Sinica 36(4). To appear.
- Tan, K. M., L. Wang, and W.-X. Zhou (2022). High-dimensional quantile regression: Convolution smoothing and concave regularization. Journal of the Royal Statistical Society Series B: Statistical Methodology 84(1), 205–233.
- Tian, Q., D. Zeng, and J. Zhao (2025). Identification and efficient estimation in regression analysis with response missing not at random. Statistica Sinica. Forthcoming.
- Tibshirani, R. J., R. Foygel Barber, E. Candes, and A. Ramdas (2019). Conformal prediction under covariate shift. Advances in Neural Information Processing Systems 32.
- Vovk, V., A. Gammerman, and G. Shafer (2005). Algorithmic learning in a random world, Volume 29. Springer.
- Wang, H. J., X. Feng, and C. Dong (2019). Copula-based quantile regression for longitudinal data. Statistica Sinica 29(1), 245–264.
- Wang, L., P. Zhao, and J. Shao (2021). Dimension-reduced semiparametric estimation of distribution functions and quantiles with nonignorable nonresponse. Computational Statistics & Data Analysis 156, 107142.
- Yin, M., C. Shi, Y. Wang, and D. M. Blei (2024). Conformal sensitivity analysis for individual treatment effects. Journal of the American Statistical Association 119(545), 122–135.
- Yu, A., Y. Zhong, X. Feng, and Y. Wei (2023). Quantile regression for nonignorable missing data with its application of analyzing electronic medical records. Biometrics 79(3), 2036–2049.
- Yuan, Y. and G. Yin (2010). Bayesian quantile regression for longitudinal studies with nonignorable missing data. Biometrics 66(1), 105–114.
- Zhang, L., C. Lin, and Y. Zhou (2018). Generalized method of moments for nonignorable missing data. Statistica Sinica 28(4), 2107–2124.
- Zhang, T. and L. Wang (2020). Smoothed empirical likelihood inference and variable selection for quantile regression with nonignorable missing response. Computational Statistics & Data Analysis 144, 106888.
- Zhang, Y., C. Shi, and S. Luo (2023). Conformal off-policy prediction. In International Conference on Artificial Intelligence and Statistics, pp. 2751–2768. PMLR.
- Zhao, J. and Y. Ma (2022). A versatile estimation procedure without estimating the nonignorable missingness mechanism. Journal of the American Statistical Association 117(540), 1916–1930.
- Zhao, J. and J. Shao (2015). Semiparametric pseudo-likelihoods in generalized linear models with nonignorable missing data. Journal of the American Statistical Association 110(512), 1577–1590.
- Zhao, P., N. Tang, and H. Zhu (2020). Generalized empirical likelihood inferences for nonsmooth moment functions with nonignorable missing values. Statistica Sinica 30(1), 217–249.
- Zhao, P., L. Wang, and J. Shao (2021). Sufficient dimension reduction and instrument search for data with nonignorable nonresponse. Bernoulli 27(2), 930–945.
- KLATASDS-MOE, School of Statistics, East China Normal University
Acknowledgments
We sincerely thank the Editor, Associate Editor, and two anonymous reviewers for their constructive comments and helpful suggestions. The re-
search of Tang and Zhang was partially supported by the National Natural Science Foundation of China (grants 12371265 and 12471280), Funda-
mental and Interdisciplinary Disciplines Breakthrough Plan of the Ministry
of Education of China (grant JYB2025XDXM904) and Chern Institute of
Mathematics, Nankai University, and the research of Wang was partially
supported by the National Science Foundation (grant DMS-2436216).
Supplementary Materials
The online Supplementary Materials provided detailed technical proofs,
high-dimensional extensions, and additional numerical experiments.