Learning Optimal Treatment Regimes with Survival Data under Imperfect Compliance: An Instrumental Variable Approach

Yifan Cui, Jianhua Guo, Wendong Li, Frank Tanser and Dongdong Xiang

doi:10.5705/ss.202024.0420

Abstract

Estimating individualized optimal treatment regimes (OTR) is a cen

tral task for precision medicine. The clinical outcome of interest is often censored

survival time due to reasons such as early dropout. Additionally, it is hard to

completely rule out confounding by unmeasured factors in observational studies and randomized trails subject to imperfect compliance. These issues make

estimating OTR extremely challenging. In this paper, we propose an instrumental variable (IV) approach to estimate OTR in the presence of data censoring

and unmeasured confounding subject to imperfect compliance. By introducing a

binary IV into the outcome-weighted learning framework, we establish the identification of OTR based on a no unmeasured common effect modifier assumption.

We also derive a doubly robust estimator with cross-fitting to protect against

The authors are listed in alphabetical order.

model misspecification. A comparison between our proposed treatment regimes

and intention-to-treat analysis further shows the superiority of our methods in

practice. We illustrate the proposed methods using simulation study and a real

application to an HIV dataset, providing further empirical evidence that living

in a community with high coverage of antiretroviral therapy reduces the risk of

acquiring HIV.

Key words and phrases: Causal inference, unmeasured confounding, survival data, imperfect compliance, instrumental variable, optimal treatment regimes

Information

Preprint No.	SS-2024-0420
Manuscript ID	SS-2024-0420
Complete Authors	Yifan Cui, Jianhua Guo, Wendong Li, Frank Tanser, Dongdong Xiang
Corresponding Authors	Dongdong Xiang
Emails	ddxiang@sfs.ecnu.edu.cn

References

Angrist, J. D., G. W. Imbens, and D. B. Rubin (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association 91(434), 444–455.
Athey, S. and S. Wager (2021). Policy learning with observational data. Econometrica 89(1), 133–161.
Baiocchi, M., D. S. Small, S. Lorch, and P. R. Rosenbaum (2010). Building a stronger instrument in an observational study of perinatal care for premature infants. Journal of the American Statistical Association 105(492), 1285–1296.
Chakraborty, B. and E. Moodie (2013). Statistical methods for dynamic treatment regimes. Springer.
Chen, G., D. Zeng, and M. R. Kosorok (2016). Personalized dose finding using outcome weighted learning. Journal of the American Statistical Association 111(516), 1509–1521.
Cho, H., S. T. Holloway, D. J. Couper, and M. R. Kosorok (2023). Multi-stage optimal dynamic treatment regimes for survival outcomes with dependent censoring. Biometrika 110(2), 395–410.
Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological) 34(2), 187–202.
Cui, Y. (2021). Individualized decision-making under partial identification: Three perspectives, two optimality results, and one paradox. Harvard Data Science Review 3(3), 1–19.
Cui, Y., H. Pu, X. Shi, W. Miao, and E. Tchetgen Tchetgen (2024). Semiparametric proximal causal inference. Journal of the American Statistical Association 119(546), 1348–1359.
Cui, Y. and E. Tchetgen Tchetgen (2021a). On a necessary and sufficient identification condition of optimal treatment regimes with an instrumental variable. Statistics & Probability Letters 178, 109180.
Cui, Y. and E. Tchetgen Tchetgen (2021b). A semiparametric instrumental variable approach to optimal treatment regimes under endogeneity. Journal of the American Statistical Association 116(533), 162–173.
Cui, Y., R. Zhu, and M. Kosorok (2017). Tree based weighted learning for estimating individualized treatment rules with censored data. Electronic journal of statistics 11(2), 3927.
Ertefaie, A., D. S. Small, and P. R. Rosenbaum (2018). Quantitative evaluation of the tradeoff of strengthened instruments and sample size in observational studies. Journal of the American Statistical Association 113(523), 1122–1134.
Fu, Z., Z. Qi, Z. Wang, Z. Yang, Y. Xu, and M. R. Kosorok (2022). Offline reinforcement learning with instrumental variables in confounded markov decision processes. arXiv preprint arXiv:2209.08666.
Han, S. (2021). Identification in nonparametric models for dynamic treatment effects. Journal of Econometrics 225(2), 132–147.
Imbens, G. and J. Angrist (1994). Identification and estimation of local average treatment effects. Econometrica 62(2), 467–475.
Ishwaran, H., U. B. Kogalur, E. H. Blackstone, and M. S. Lauer (2008). Random survival forest. Annals of Applied Statistics 2(3), 841–860.
Jiang, R., W. Lu, R. Song, M. G. Hudgens, and S. Naprvavnik (2017). Doubly robust estimation of optimal treatment regimes for survival data with application to an hiv/aids study. Annals of Applied Statistics 11(3), 1763.
Jiang, Z., S. Chen, and P. Ding (2023). An instrumental variable method for point processes: generalized wald estimation based on deconvolution. Biometrika 110(4), 989–1008.
Kitagawa, T. and A. Tetenov (2018). Who should be treated? empirical welfare maximization methods for treatment choice. Econometrica 86(2), 591–616.
Kosorok, M. R. and E. B. Laber (2019). Precision medicine. Annual Review of Statistics and Its Application 6(1), 263–286.
Liao, L., Z. Fu, Z. Yang, Y. Wang, M. Kolar, and Z. Wang (2021). Instrumental variable value iteration for causal offline reinforcement learning. arXiv preprint arXiv:2102.09907.
Liu, Y., Y. Wang, R. M. Kosorok, Y. Zhao, and D. Zeng (2016). Robust hybrid learning for estimating personalized dynamic treatment regimens. arXiv no. 1611.02314.
Liu, Z., T. Ye, B. Sun, M. Schooling, and E. Tchetgen Tchetgen (2020). On mendelian randomization mixed-scale treatment effect robust identification (mr misteri) and estimation for causal inference. arXiv preprint arXiv:2009.14484.
Miao, W., Z. Geng, and E. J. Tchetgen Tchetgen (2018). Identifying causal effects with proxy variables of an unmeasured confounder. Biometrika 105(4), 987–993.
Michael, H., Y. Cui, S. A. Lorch, and E. J. Tchetgen Tchetgen (2024). Instrumental variable estimation of marginal structural mean models for time-varying treatment. Journal of the American Statistical Association 119(546), 1240–1251.
Mo, W., Z. Qi, and Y. Liu (2021). Learning optimal distributionally robust individualized treatment rules. Journal of the American Statistical Association 116(534), 659–674.
Murphy, S. A. (2003). Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65(2), 331–355.
Park, C., D. B. Richardson, and E. J. Tchetgen Tchetgen (2024). Single proxy control. Biometrics 80(2), ujae027.
Pu, H. and B. Zhang (2021). Estimating optimal treatment rules with an instrumental variable: A partial identification learning approach. Journal of the Royal Statistical Society Series B 83(2), 318–345.
Qi, Z., D. Liu, H. Fu, and Y. Liu (2020). Multi-armed angle-based direct learning for estimating optimal individualized treatment rules with various outcomes. Journal of the American Statistical Association 115(530), 678–691.
Qian, M. and S. A. Murphy (2011). Performance guarantees for individualized treatment rules. Annals of statistics 39(2), 1180–1210.
Qiu, H., M. Carone, E. Sadikova, M. Petukhova, R. C. Kessler, and A. Luedtke (2021). Optimal individualized decision rules using instrumental variable methods (with discussion). Journal of the American Statistical Association 116(533), 174–191.
Rubin, D. B. and M. J. van der Laan (2012). Statistical issues and limitations in personalized medicine research with clinical trials. The International Journal of Biostatistics 8(1), 18.
Sun, B., Y. Cui, and E. Tchetgen Tchetgen (2022). Selective machine learning of the average treatment effect with an invalid instrumental variable. Journal of Machine Learning Research 23(204), 1–40.
Sun, B., Z. Liu, and E. Tchetgen Tchetgen (2023). Semiparametric efficient g-estimation with invalid instrumental variables. Biometrika 110(4), 953–971.
Sun, H., B. A. Craig, and L. Zhang (2017). Angle-based multicategory distance-weighted svm. The Journal of Machine Learning Research 18(1), 2981–3001.
Sverdrup, E., A. Kanodia, Z. Zhou, S. Athey, and S. Wager (2020). policytree: Policy learning via doubly robust empirical welfare maximization over trees. Journal of Open Source Software 5(50), 2232.
Tanser, F., T. Bärnighausen, E. Grapsa, J. Zaidi, and M.-L. Newell (2013). High coverage of art associated with decline in risk of hiv acquisition in rural kwazulu-natal, south africa. Science 339(6122), 966–971.
Tanser, F., H.-Y. Kim, A. Vandormael, C. Iwuji, and T. Bärnighausen (2020). Opportunities and challenges in hiv treatment as prevention research: results from the anrs 12249 clusterrandomized trial and associated population cohort. Current Hiv/Aids Reports 17, 97–108.
Tchetgen Tchetgen, E., B. Sun, and S. Walter (2021). The genius approach to robust mendelian randomization inference. Statistical Science 36(3), 443–464.
Tchetgen Tchetgen, E. J., A. Ying, Y. Cui, X. Shi, and W. Miao (2020). An introduction to proximal causal learning. arXiv preprint arXiv:2009.10982.
Wang, Z. and T. A. Louis (2003). Matching conditional and marginal shapes in binary random intercept models using a bridge distribution function. Biometrika 90(4), 765–775.
Xue, F., Y. Zhang, W. Zhou, H. Fu, and A. Qu (2022). Multicategory angle-based learning for estimating optimal dynamic treatment regimes with censored data. Journal of the American Statistical Association 117(539), 1438–1451.
Ye, T., A. Ertefaie, J. Flory, S. Hennessy, and D. S. Small (2023). Instrumented difference-indifferences. Biometrics 79(2), 569–581.
Ying, A., Y. Cui, and E. J. T. Tchetgen (2022). Proximal causal inference for marginal counterfactual survival curves. arXiv preprint arXiv:2204.13144.
Zhang, B., A. A. Tsiatis, M. Davidian, M. Zhang, and E. Laber (2012). Estimating optimal treatment regimes from a classification perspective. Stat 1(1), 103–114.
Zhao, Y., D. Zeng, E. B. Laber, R. Song, M. Yuan, and M. R. Kosorok (2015). Doubly robust learning for estimating individualized treatment with censored data. Biometrika 102(1), 151–168.
Zhao, Y., D. Zeng, A. J. Rush, and M. R. Kosorok (2012). Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association 107(499), 1106–1118.
Zhong, Q., J. Mueller, and J.-L. Wang (2022). Deep learning for the partially linear cox model. The Annals of Statistics 50(3), 1348–1375.
Zhong, Q., J. W. Mueller, and J.-L. Wang (2021). Deep extended hazard models for survival analysis. Advances in Neural Information Processing Systems 34, 15111–15124.
Zhou, X. and M. R. Kosorok (2017). Augmented outcome-weighted learning for optimal treatment regimes. arXiv preprint arXiv:1711.10654.
Zhu, R. and M. R. Kosorok (2012). Recursively imputed survival trees. Journal of the American Statistical Association 107(497), 331–340.
Zhu, R., Y.-Q. Zhao, G. Chen, S. Ma, and H. Zhao (2017). Greedy outcome weighted tree learning of optimal personalized treatment rules. Biometrics 73(2), 391–400.
Zivich, P. N. and A. Breskin (2021). Machine learning for causal inference: on the use of cross-fit estimators. Epidemiology 32(3), 393–401.
Zubizarreta, J. R., D. S. Small, N. K. Goyal, S. Lorch, and P. R. Rosenbaum (2013). Stronger instruments via integer programming in an observational study of late preterm birth outcomes. The Annals of Applied Statistics 7(1), 25–50. Yifan Cui

Acknowledgments

The authors thank to the editors and anonymous referees for their valuable comments and constructive suggestions that improve the quality of

this work significantly. This work was partially supported by National Key

R&D Program of China (2024YFA1015600, 2022YFA1003801, 2021YFA1000101,

2021YFA1000102, 2020YFA0714102), National Natural Science Foundation

of China (12431009, 12471254, 12471266, 12201382, 12071144, U23A2064),

Shanghai Pilot Program for Basic Research (TQ20240201), Basic Research

Project of Shanghai Science and Technology Commission (22JC1400800).

Frank Tanser is supported by the National Institute of Mental Health

(NIMH) (Award # R01MH131480). AHRI’s Demographic Surveillance Information System and Population Intervention Program is funded by the

Wellcome Trust (227167/A/23/Z) and the South Africa Population Research Infrastructure Network (funded by the South African Department

of Science and Technology and hosted by the South African Medical Research Council).

Supplementary Materials

Detailed proofs of Theorems 1-2 and Proposition 1 as well as the theoretical

results of Fisher consistency, excess risk bound and universal consistency of

the estimated treatment regimes.

Supplementary materials are available for download.

[1] Angrist, J. D., G. W. Imbens, and D. B. Rubin (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association 91(434), 444–455.

[2] Athey, S. and S. Wager (2021). Policy learning with observational data. Econometrica 89(1), 133–161.

[3] Baiocchi, M., D. S. Small, S. Lorch, and P. R. Rosenbaum (2010). Building a stronger instrument in an observational study of perinatal care for premature infants. Journal of the American Statistical Association 105(492), 1285–1296.

[4] Chakraborty, B. and E. Moodie (2013). Statistical methods for dynamic treatment regimes. Springer.

[5] Chen, G., D. Zeng, and M. R. Kosorok (2016). Personalized dose finding using outcome weighted learning. Journal of the American Statistical Association 111(516), 1509–1521.

[6] Cho, H., S. T. Holloway, D. J. Couper, and M. R. Kosorok (2023). Multi-stage optimal dynamic treatment regimes for survival outcomes with dependent censoring. Biometrika 110(2), 395–410.

[7] Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological) 34(2), 187–202.

[8] Cui, Y. (2021). Individualized decision-making under partial identification: Three perspectives, two optimality results, and one paradox. Harvard Data Science Review 3(3), 1–19.

[9] Cui, Y., H. Pu, X. Shi, W. Miao, and E. Tchetgen Tchetgen (2024). Semiparametric proximal causal inference. Journal of the American Statistical Association 119(546), 1348–1359.

[10] Cui, Y. and E. Tchetgen Tchetgen (2021a). On a necessary and sufficient identification condition of optimal treatment regimes with an instrumental variable. Statistics & Probability Letters 178, 109180.

[11] Cui, Y. and E. Tchetgen Tchetgen (2021b). A semiparametric instrumental variable approach to optimal treatment regimes under endogeneity. Journal of the American Statistical Association 116(533), 162–173.

[12] Cui, Y., R. Zhu, and M. Kosorok (2017). Tree based weighted learning for estimating individualized treatment rules with censored data. Electronic journal of statistics 11(2), 3927.

[13] Ertefaie, A., D. S. Small, and P. R. Rosenbaum (2018). Quantitative evaluation of the tradeoff of strengthened instruments and sample size in observational studies. Journal of the American Statistical Association 113(523), 1122–1134.

[14] Fu, Z., Z. Qi, Z. Wang, Z. Yang, Y. Xu, and M. R. Kosorok (2022). Offline reinforcement learning with instrumental variables in confounded markov decision processes. arXiv preprint arXiv:2209.08666.

[15] Han, S. (2021). Identification in nonparametric models for dynamic treatment effects. Journal of Econometrics 225(2), 132–147.

[16] Imbens, G. and J. Angrist (1994). Identification and estimation of local average treatment effects. Econometrica 62(2), 467–475.

[17] Ishwaran, H., U. B. Kogalur, E. H. Blackstone, and M. S. Lauer (2008). Random survival forest. Annals of Applied Statistics 2(3), 841–860.

[18] Jiang, R., W. Lu, R. Song, M. G. Hudgens, and S. Naprvavnik (2017). Doubly robust estimation of optimal treatment regimes for survival data with application to an hiv/aids study. Annals of Applied Statistics 11(3), 1763.

[19] Jiang, Z., S. Chen, and P. Ding (2023). An instrumental variable method for point processes: generalized wald estimation based on deconvolution. Biometrika 110(4), 989–1008.

[20] Kitagawa, T. and A. Tetenov (2018). Who should be treated? empirical welfare maximization methods for treatment choice. Econometrica 86(2), 591–616.

[21] Kosorok, M. R. and E. B. Laber (2019). Precision medicine. Annual Review of Statistics and Its Application 6(1), 263–286.

[22] Liao, L., Z. Fu, Z. Yang, Y. Wang, M. Kolar, and Z. Wang (2021). Instrumental variable value iteration for causal offline reinforcement learning. arXiv preprint arXiv:2102.09907.

[23] Liu, Y., Y. Wang, R. M. Kosorok, Y. Zhao, and D. Zeng (2016). Robust hybrid learning for estimating personalized dynamic treatment regimens. arXiv no. 1611.02314.

[24] Liu, Z., T. Ye, B. Sun, M. Schooling, and E. Tchetgen Tchetgen (2020). On mendelian randomization mixed-scale treatment effect robust identification (mr misteri) and estimation for causal inference. arXiv preprint arXiv:2009.14484.

[25] Miao, W., Z. Geng, and E. J. Tchetgen Tchetgen (2018). Identifying causal effects with proxy variables of an unmeasured confounder. Biometrika 105(4), 987–993.

[26] Michael, H., Y. Cui, S. A. Lorch, and E. J. Tchetgen Tchetgen (2024). Instrumental variable estimation of marginal structural mean models for time-varying treatment. Journal of the American Statistical Association 119(546), 1240–1251.

[27] Mo, W., Z. Qi, and Y. Liu (2021). Learning optimal distributionally robust individualized treatment rules. Journal of the American Statistical Association 116(534), 659–674.

[28] Murphy, S. A. (2003). Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65(2), 331–355.

[29] Park, C., D. B. Richardson, and E. J. Tchetgen Tchetgen (2024). Single proxy control. Biometrics 80(2), ujae027.

[30] Pu, H. and B. Zhang (2021). Estimating optimal treatment rules with an instrumental variable: A partial identification learning approach. Journal of the Royal Statistical Society Series B 83(2), 318–345.

[31] Qi, Z., D. Liu, H. Fu, and Y. Liu (2020). Multi-armed angle-based direct learning for estimating optimal individualized treatment rules with various outcomes. Journal of the American Statistical Association 115(530), 678–691.

[32] Qian, M. and S. A. Murphy (2011). Performance guarantees for individualized treatment rules. Annals of statistics 39(2), 1180–1210.

[33] Qiu, H., M. Carone, E. Sadikova, M. Petukhova, R. C. Kessler, and A. Luedtke (2021). Optimal individualized decision rules using instrumental variable methods (with discussion). Journal of the American Statistical Association 116(533), 174–191.

[34] Rubin, D. B. and M. J. van der Laan (2012). Statistical issues and limitations in personalized medicine research with clinical trials. The International Journal of Biostatistics 8(1), 18.

[35] Sun, B., Y. Cui, and E. Tchetgen Tchetgen (2022). Selective machine learning of the average treatment effect with an invalid instrumental variable. Journal of Machine Learning Research 23(204), 1–40.

[36] Sun, B., Z. Liu, and E. Tchetgen Tchetgen (2023). Semiparametric efficient g-estimation with invalid instrumental variables. Biometrika 110(4), 953–971.

[37] Sun, H., B. A. Craig, and L. Zhang (2017). Angle-based multicategory distance-weighted svm. The Journal of Machine Learning Research 18(1), 2981–3001.

[38] Sverdrup, E., A. Kanodia, Z. Zhou, S. Athey, and S. Wager (2020). policytree: Policy learning via doubly robust empirical welfare maximization over trees. Journal of Open Source Software 5(50), 2232.

[39] Tanser, F., T. Bärnighausen, E. Grapsa, J. Zaidi, and M.-L. Newell (2013). High coverage of art associated with decline in risk of hiv acquisition in rural kwazulu-natal, south africa. Science 339(6122), 966–971.

[40] Tanser, F., H.-Y. Kim, A. Vandormael, C. Iwuji, and T. Bärnighausen (2020). Opportunities and challenges in hiv treatment as prevention research: results from the anrs 12249 clusterrandomized trial and associated population cohort. Current Hiv/Aids Reports 17, 97–108.

[41] Tchetgen Tchetgen, E., B. Sun, and S. Walter (2021). The genius approach to robust mendelian randomization inference. Statistical Science 36(3), 443–464.

[42] Tchetgen Tchetgen, E. J., A. Ying, Y. Cui, X. Shi, and W. Miao (2020). An introduction to proximal causal learning. arXiv preprint arXiv:2009.10982.

[43] Wang, Z. and T. A. Louis (2003). Matching conditional and marginal shapes in binary random intercept models using a bridge distribution function. Biometrika 90(4), 765–775.

[44] Xue, F., Y. Zhang, W. Zhou, H. Fu, and A. Qu (2022). Multicategory angle-based learning for estimating optimal dynamic treatment regimes with censored data. Journal of the American Statistical Association 117(539), 1438–1451.

[45] Ye, T., A. Ertefaie, J. Flory, S. Hennessy, and D. S. Small (2023). Instrumented difference-indifferences. Biometrics 79(2), 569–581.

[46] Ying, A., Y. Cui, and E. J. T. Tchetgen (2022). Proximal causal inference for marginal counterfactual survival curves. arXiv preprint arXiv:2204.13144.

[47] Zhang, B., A. A. Tsiatis, M. Davidian, M. Zhang, and E. Laber (2012). Estimating optimal treatment regimes from a classification perspective. Stat 1(1), 103–114.

[48] Zhao, Y., D. Zeng, E. B. Laber, R. Song, M. Yuan, and M. R. Kosorok (2015). Doubly robust learning for estimating individualized treatment with censored data. Biometrika 102(1), 151–168.

[49] Zhao, Y., D. Zeng, A. J. Rush, and M. R. Kosorok (2012). Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association 107(499), 1106–1118.

[50] Zhong, Q., J. Mueller, and J.-L. Wang (2022). Deep learning for the partially linear cox model. The Annals of Statistics 50(3), 1348–1375.

[51] Zhong, Q., J. W. Mueller, and J.-L. Wang (2021). Deep extended hazard models for survival analysis. Advances in Neural Information Processing Systems 34, 15111–15124.

[52] Zhou, X. and M. R. Kosorok (2017). Augmented outcome-weighted learning for optimal treatment regimes. arXiv preprint arXiv:1711.10654.

[53] Zhu, R. and M. R. Kosorok (2012). Recursively imputed survival trees. Journal of the American Statistical Association 107(497), 331–340.

[54] Zhu, R., Y.-Q. Zhao, G. Chen, S. Ma, and H. Zhao (2017). Greedy outcome weighted tree learning of optimal personalized treatment rules. Biometrics 73(2), 391–400.

[55] Zivich, P. N. and A. Breskin (2021). Machine learning for causal inference: on the use of cross-fit estimators. Epidemiology 32(3), 393–401.

[56] Zubizarreta, J. R., D. S. Small, N. K. Goyal, S. Lorch, and P. R. Rosenbaum (2013). Stronger instruments via integer programming in an observational study of late preterm birth outcomes. The Annals of Applied Statistics 7(1), 25–50. Yifan Cui