Conformal Causal Inference for Cluster Randomized Trials: Model-robust Inference Without Asymptotic Approximations

Bingkai Wang, Fan Li and Mengxin Yu

doi:10.5705/ss.202025.0476

Abstract

Traditional statistical inference in cluster randomized trials typically

invokes the asymptotic theory that requires the number of clusters to approach

infinity.

In this article, we propose an alternative conformal causal inference

framework for analyzing cluster randomized trials that achieves the target inferential goal in finite samples without the need for asymptotic approximations.

Different from traditional inference focusing on estimating the average treatment

effect, our conformal causal inference aims to provide prediction intervals for the

difference of counterfactual outcomes, thereby providing a new decision-making

tool for clusters and individuals in the same target population. We prove that

this framework is compatible with arbitrary working outcome models—including

ORCID:

Bingkai

Wang

Fan

Mengxin

6818-4083.

data-adaptive machine learning methods that maximally leverage information

from baseline covariates, and enjoys robustness against misspecification of working outcome models. Under our conformal causal inference framework, we develop

efficient computation algorithms to construct prediction intervals for treatment

effects at both the cluster and individual levels, and further extend to address

inferential targets defined based on pre-specified covariate subgroups. Finally,

we demonstrate the properties of our methods via simulations and a real data

application based on a cluster randomized trial for treating chronic pain.

Key words and phrases: Conformal prediction, machine learning, individual-level treatment effect, cluster-level treatment effect, finite-sample coverage

Information

Preprint No.	SS-2025-0476
Manuscript ID	SS-2025-0476
Complete Authors	Bingkai Wang, Fan Li, Mengxin Yu
Corresponding Authors	Bingkai Wang
Emails	bingkai.w@gmail.com

References

Alaa, A. M., Z. Ahmad, and M. van der Laan (2023). Conformal meta-learners for predictive inference of individual treatment effects. Advances in neural information processing systems 36, 47682–47703.
Balzer, L. B., M. L. Petersen, M. J. van der Laan, and S. Collaboration (2016). Targeted estimation and inference for the sample average treatment effect in trials with and without pair-matching. Statistics in Medicine 35(21), 3717–3732.
Balzer, L. B., M. L. Petersen, M. J. van der Laan, and S. Consortium (2015). Adaptive pairmatching in randomized trials with unbiased and efficient effect estimation. Statistics in medicine 34(6), 999–1011.
Balzer, L. B., M. van der Laan, J. Ayieko, M. Kamya, G. Chamie, J. Schwab, D. V. Havlir, and
M. L. Petersen (2023). Two-stage TMLE to reduce bias and improve efficiency in cluster randomized trials. Biostatistics 24(2), 502–517.
Balzer, L. B., M. J. van der Laan, M. L. Petersen, and S. Collaboration (2016). Adaptive pre-specification in randomized trials with and without pair-matching. Statistics in medicine 35(25), 4528–4545.
Barber, R. F., E. J. Cand`es, A. Ramdas, and R. J. Tibshirani (2021). Predictive inference with the jackknife+. The Annals of Statistics 49(1), 486 – 507.
Barber, R. F., E. J. Candes, A. Ramdas, and R. J. Tibshirani (2023). Conformal prediction beyond exchangeability. The Annals of Statistics 51(2), 816–845.
Benitez, A., M. L. Petersen, M. J. van der Laan, N. Santos, E. Butrick, D. Walker, R. Ghosh,
P. Otieno, P. Waiswa, and L. B. Balzer (2023). Defining and estimating effects in cluster randomized trials: A methods comparison. Statistics in Medicine 42(19), 3443–3466.
Breiman, L. (2001). Random forests. Machine learning 45, 5–32.
DeBar, L., M. Mayhew, L. Benes, A. Bonifay, R. A. Deyo, C. R. Elder, F. J. Keefe, M. C. Leo,
C. McMullen, A. Owen-Smith, et al. (2022). A primary care–based cognitive behavioral therapy intervention for long-term opioid users with chronic pain: a randomized pragmatic trial. Annals of Internal Medicine 175(1), 46–55.
Ding, P. and L. Keele (2018). Rank tests in unmatched clustered randomized trials applied to a study of teacher training. The Annals of Applied Statistics 12(4), 2151–2174.
Dobriban, E. and M. Yu (2025). Symmpi: predictive inference for data with group symmetries. Journal of the Royal Statistical Society Series B: Statistical Methodology, qkaf022.
Dunn, R., L. Wasserman, and A. Ramdas (2023). Distribution-free prediction sets for two-layer hierarchical models. Journal of the American Statistical Association 118(544), 2491–2502.
Hayes, R. J. and L. H. Moulton (2017). Cluster randomised trials. Chapman and Hall/CRC.
Hore, R. and R. F. Barber (2025). Conformal prediction with local weights: randomization enables robust guarantees. Journal of the Royal Statistical Society Series B: Statistical Methodology 87(2), 549–578.
Jin, Y., Z. Ren, and E. J. Cand`es (2023). Sensitivity analysis of individual treatment effects: A robust conformal inference approach. Proceedings of the National Academy of Sciences 120(6), e2214889120.
Kahan, B. C., F. Li, A. J. Copas, and M. O. Harhay (2023). Estimands in cluster-randomized trials: choosing analyses that answer the right question. International Journal of Epidemiology 52(1), 107–118.
Lee, Y., R. Barber, and R. Willett (2023). Distribution-free inference with hierarchical data. ACM Journal of Data Science.
Lei, L. and E. J. Cand`es (2021). Conformal inference of counterfactuals and individual treatment effects. Journal of the Royal Statistical Society Series B: Statistical Methodology 83(5), 911–938.
Li, F., J. Tong, X. Fang, C. Cheng, B. C. Kahan, and B. Wang (2025). Model-robust standardization in cluster-randomized trials. Statistics in Medicine 44(20-22), e70270.
Murray, D. M. et al. (1998). Design and Analysis of Group-Randomized Trials, Volume 29. Oxford University Press, USA.
Papadopoulos, H., K. Proedrou, V. Vovk, and A. Gammerman (2002). Inductive confidence machines for regression. In Machine Learning: ECML 2002: 13th European Conference on Machine Learning Helsinki, Finland, August 19–23, 2002 Proceedings 13, pp. 345–356. Springer.
Qiu, H., E. Dobriban, and E. Tchetgen Tchetgen (2023). Prediction sets adaptive to unknown covariate shift. Journal of the Royal Statistical Society Series B: Statistical Methodology 85(5), 1680–1705.
Rabideau, D. J. and R. Wang (2021). Randomization-based confidence intervals for cluster randomized trials. Biostatistics 22(4), 913–927.
Small, D. S., T. R. Ten Have, and P. R. Rosenbaum (2008). Randomization inference in a group– randomized trial of treatments for depression: covariate adjustment, noncompliance, and quantile effects. Journal of the American Statistical Association 103(481), 271–279.
Su, F. and P. Ding (2021). Model-assisted analyses of cluster-randomized experiments. Journal of the Royal Statistical Society, Series B 83(5), 994–1015.
Tibshirani, R. J., R. Foygel Barber, E. Candes, and A. Ramdas (2019). Conformal prediction under covariate shift. Advances in neural information processing systems 32, 1–11.
van der Laan, M. J., E. C. Polley, and A. E. Hubbard (2007). Super learner. Statistical Applications in Genetics and Molecular Biology 6(1), 1–21.
Vovk, V. (2012). Conditional validity of inductive conformal predictors. In Asian conference on machine learning, pp. 475–490. PMLR.
Vovk, V., A. Gammerman, and G. Shafer (2005). Algorithmic learning in a random world, Volume 29. Springer.
Wang, B., M. O. Harhay, J. Tong, D. S. Small, T. P. Morris, and F. Li (2026). On the mixedmodel analysis of covariance in cluster-randomized trials. Statistical science 41(1), 49.
Wang, B., C. Park, D. S. Small, and F. Li (2024). Model-robust and efficient covariate adjustment for cluster-randomized experiments. Journal of the American Statistical Association 119(548), 2959–2971.
Wang, X., K. S. Goldfeld, M. Taljaard, and F. Li (2024). Sample size requirements to test subgroup-specific treatment effects in cluster-randomized trials. Prevention Science 25(Suppl 3), 356–370.
Wu, J. and P. Ding (2021). Randomization tests for weak null hypotheses in randomized experiments. Journal of the American Statistical Association 116(536), 1898–1913.
Yang, Y., A. K. Kuchibhotla, and E. Tchetgen Tchetgen (2024). Doubly robust calibration of prediction sets under covariate shift. Journal of the Royal Statistical Society Series B: Statistical Methodology 86(4), 943–965.
Yin, M., C. Shi, Y. Wang, and D. M. Blei (2024). Conformal sensitivity analysis for individual treatment effects. Journal of the American Statistical Association 119(545), 122–135.

Acknowledgments

Research reported in this publication was supported by the National Institute Of Allergy And Infectious Diseases of the National Institutes of Health

under Award Number R00AI173395. The content is solely the responsibility of the authors and does not necessarily represent the official views of

the National Institutes of Health.

Supplementary Materials

The Supplementary Materials include Web Appendices, Tables, and Figures, and code referenced in Sections 3-6.

Supplementary materials are available for download.

[1] Alaa, A. M., Z. Ahmad, and M. van der Laan (2023). Conformal meta-learners for predictive inference of individual treatment effects. Advances in neural information processing systems 36, 47682–47703.

[2] Balzer, L. B., M. L. Petersen, M. J. van der Laan, and S. Collaboration (2016). Targeted estimation and inference for the sample average treatment effect in trials with and without pair-matching. Statistics in Medicine 35(21), 3717–3732.

[3] Balzer, L. B., M. L. Petersen, M. J. van der Laan, and S. Consortium (2015). Adaptive pairmatching in randomized trials with unbiased and efficient effect estimation. Statistics in medicine 34(6), 999–1011.

[4] Balzer, L. B., M. van der Laan, J. Ayieko, M. Kamya, G. Chamie, J. Schwab, D. V. Havlir, and

[5] M. L. Petersen (2023). Two-stage TMLE to reduce bias and improve efficiency in cluster randomized trials. Biostatistics 24(2), 502–517.

[6] Balzer, L. B., M. J. van der Laan, M. L. Petersen, and S. Collaboration (2016). Adaptive pre-specification in randomized trials with and without pair-matching. Statistics in medicine 35(25), 4528–4545.

[7] Barber, R. F., E. J. Cand`es, A. Ramdas, and R. J. Tibshirani (2021). Predictive inference with the jackknife+. The Annals of Statistics 49(1), 486 – 507.

[8] Barber, R. F., E. J. Candes, A. Ramdas, and R. J. Tibshirani (2023). Conformal prediction beyond exchangeability. The Annals of Statistics 51(2), 816–845.

[9] Benitez, A., M. L. Petersen, M. J. van der Laan, N. Santos, E. Butrick, D. Walker, R. Ghosh,

[10] P. Otieno, P. Waiswa, and L. B. Balzer (2023). Defining and estimating effects in cluster randomized trials: A methods comparison. Statistics in Medicine 42(19), 3443–3466.

[11] Breiman, L. (2001). Random forests. Machine learning 45, 5–32.

[12] DeBar, L., M. Mayhew, L. Benes, A. Bonifay, R. A. Deyo, C. R. Elder, F. J. Keefe, M. C. Leo,

[13] C. McMullen, A. Owen-Smith, et al. (2022). A primary care–based cognitive behavioral therapy intervention for long-term opioid users with chronic pain: a randomized pragmatic trial. Annals of Internal Medicine 175(1), 46–55.

[14] Ding, P. and L. Keele (2018). Rank tests in unmatched clustered randomized trials applied to a study of teacher training. The Annals of Applied Statistics 12(4), 2151–2174.

[15] Dobriban, E. and M. Yu (2025). Symmpi: predictive inference for data with group symmetries. Journal of the Royal Statistical Society Series B: Statistical Methodology, qkaf022.

[16] Dunn, R., L. Wasserman, and A. Ramdas (2023). Distribution-free prediction sets for two-layer hierarchical models. Journal of the American Statistical Association 118(544), 2491–2502.

[17] Hayes, R. J. and L. H. Moulton (2017). Cluster randomised trials. Chapman and Hall/CRC.

[18] Hore, R. and R. F. Barber (2025). Conformal prediction with local weights: randomization enables robust guarantees. Journal of the Royal Statistical Society Series B: Statistical Methodology 87(2), 549–578.

[19] Jin, Y., Z. Ren, and E. J. Cand`es (2023). Sensitivity analysis of individual treatment effects: A robust conformal inference approach. Proceedings of the National Academy of Sciences 120(6), e2214889120.

[20] Kahan, B. C., F. Li, A. J. Copas, and M. O. Harhay (2023). Estimands in cluster-randomized trials: choosing analyses that answer the right question. International Journal of Epidemiology 52(1), 107–118.

[21] Lee, Y., R. Barber, and R. Willett (2023). Distribution-free inference with hierarchical data. ACM Journal of Data Science.

[22] Lei, L. and E. J. Cand`es (2021). Conformal inference of counterfactuals and individual treatment effects. Journal of the Royal Statistical Society Series B: Statistical Methodology 83(5), 911–938.

[23] Li, F., J. Tong, X. Fang, C. Cheng, B. C. Kahan, and B. Wang (2025). Model-robust standardization in cluster-randomized trials. Statistics in Medicine 44(20-22), e70270.

[24] Murray, D. M. et al. (1998). Design and Analysis of Group-Randomized Trials, Volume 29. Oxford University Press, USA.

[25] Papadopoulos, H., K. Proedrou, V. Vovk, and A. Gammerman (2002). Inductive confidence machines for regression. In Machine Learning: ECML 2002: 13th European Conference on Machine Learning Helsinki, Finland, August 19–23, 2002 Proceedings 13, pp. 345–356. Springer.

[26] Qiu, H., E. Dobriban, and E. Tchetgen Tchetgen (2023). Prediction sets adaptive to unknown covariate shift. Journal of the Royal Statistical Society Series B: Statistical Methodology 85(5), 1680–1705.

[27] Rabideau, D. J. and R. Wang (2021). Randomization-based confidence intervals for cluster randomized trials. Biostatistics 22(4), 913–927.

[28] Small, D. S., T. R. Ten Have, and P. R. Rosenbaum (2008). Randomization inference in a group– randomized trial of treatments for depression: covariate adjustment, noncompliance, and quantile effects. Journal of the American Statistical Association 103(481), 271–279.

[29] Su, F. and P. Ding (2021). Model-assisted analyses of cluster-randomized experiments. Journal of the Royal Statistical Society, Series B 83(5), 994–1015.

[30] Tibshirani, R. J., R. Foygel Barber, E. Candes, and A. Ramdas (2019). Conformal prediction under covariate shift. Advances in neural information processing systems 32, 1–11.

[31] van der Laan, M. J., E. C. Polley, and A. E. Hubbard (2007). Super learner. Statistical Applications in Genetics and Molecular Biology 6(1), 1–21.

[32] Vovk, V. (2012). Conditional validity of inductive conformal predictors. In Asian conference on machine learning, pp. 475–490. PMLR.

[33] Vovk, V., A. Gammerman, and G. Shafer (2005). Algorithmic learning in a random world, Volume 29. Springer.

[34] Wang, B., M. O. Harhay, J. Tong, D. S. Small, T. P. Morris, and F. Li (2026). On the mixedmodel analysis of covariance in cluster-randomized trials. Statistical science 41(1), 49.

[35] Wang, B., C. Park, D. S. Small, and F. Li (2024). Model-robust and efficient covariate adjustment for cluster-randomized experiments. Journal of the American Statistical Association 119(548), 2959–2971.

[36] Wang, X., K. S. Goldfeld, M. Taljaard, and F. Li (2024). Sample size requirements to test subgroup-specific treatment effects in cluster-randomized trials. Prevention Science 25(Suppl 3), 356–370.

[37] Wu, J. and P. Ding (2021). Randomization tests for weak null hypotheses in randomized experiments. Journal of the American Statistical Association 116(536), 1898–1913.

[38] Yang, Y., A. K. Kuchibhotla, and E. Tchetgen Tchetgen (2024). Doubly robust calibration of prediction sets under covariate shift. Journal of the Royal Statistical Society Series B: Statistical Methodology 86(4), 943–965.

[39] Yin, M., C. Shi, Y. Wang, and D. M. Blei (2024). Conformal sensitivity analysis for individual treatment effects. Journal of the American Statistical Association 119(545), 122–135.