Doubly Robust Estimation of Optimal Individual Treatment Regime in A Semi-supervised Framework

Xintong Li, Mengjiao Peng and Yong Zhou

doi:10.5705/ss.202025.0168

Abstract

In many health-care datasets like the electronic health record (EHR)

dataset, collecting labeled data can be a laborious and expensive task, resulting in a scarcity of labeled data while unlabeled data is already available. This

has sparked a growing interest in developing methods to leverage the abundant

unlabeled data. We thus develop several types of semi-supervised (SS) methods

for estimating optimal individulized treatment regime (ITR) that utilize both

labeled and unlabeled data in a general model-free framework, with efficiency

gains compared to supervised estimation methods. Our proposed method first

utilizes a flexible imputation technique through single index kernel smoothing to

exploit the unlabeled data, which performs well even in cases of multidimensional

covariates, with a follow-up estimation to determine the optimal ITR by directly

optimizing the imputed value function. Additionally, in cases where the propensity score function is unknown like in observational studies, we also develop a

doubly robust SS estimation method based on a class of monotonic index models. Our estimators are shown to be consistent with the cube root convergence

rate and exhibit a nonstandard asymptotic distribution characterized as the maximizer of a centered Gaussian process with a quadratic drift. Simulation studies

demonstrate the efficiency and robustness of the proposed methods compared to

supervised approach in finite samples. Additionally, a practical example from

the ACTG 175 study illustrates its real-world application.

Key words and phrases: Optimal treatment regime, Semi-supervised inference, Doubly robustness, Precision medicine

Information

Preprint No.	SS-2025-0168
Manuscript ID	SS-2025-0168
Complete Authors	Xintong Li, Mengjiao Peng, Yong Zhou
Corresponding Authors	Mengjiao Peng
Emails	mjpeng@fem.ecnu.edu.cn

References

Aggarwal, C. C. (2016). Recommender Systems: The Textbook. Springer, Cham.
Athey, S., Tibshirani, J., and Wager, S. (2019). Generalized random forests. The Annals of Statistics, 47:1148–1178.
Athey, S. and Wager, S. (2021). Policy learning with observational data. Econometrica, 89:133–161.
Blatt, D., Murphy, S. A., and Zhu, J. (2004). A-learning for approximate planning. Ann Arbor, 1001:48109–2122. Carr´e, N., Deveau, C., Belanger, F., Boufassa, F., Persoz, A., Jadand, C.,
Rouzioux, C., Delfraissy, J.-F., Bucquet, D., Group, S. S., et al. (1994). Effect of age and exposure group on the onset of aids in heterosexual and homosexual hiv-infected patients. Aids, 8(6):797–802.
Chakrabortty, A. and Cai, T. (2018). Efficient and adaptive linear regression in semi-supervised settings. Ann. Statist., 46:1541–1572.
Chakrabortty, A., Dai, G., and Tchetgen, E. T. (2022). A general framework for treatment effect estimation in semi-supervised and high dimensional settings. arXiv preprint arXiv:2201.00468.
Chakraborty, B., Murphy, S., and Strecher, V. (2010). Inference for nonregular parameters in optimal dynamic treatment regimes. Statistical methods in medical research, 19(3):317–343.
Chapelle, O., Scholkopf, B., and Zien, A. (2006). Semi-supervised learning. 2006. Cambridge, Massachusettes: The MIT Press View Article, 2:1.
Cheng, D., Ananthakrishnan, A. N., and Cai, T. (2021). Robust and efficient semi-supervised estimation of average treatment effects with application to electronic health records data. Biometrics, 77(2):413–423.
Chu, J., Lu, W., and Yang, S. (2023). Targeted optimal treatment regime learning using summary statistics. Biometrika, 110(4):913–931.
Correa, N., Cerquides, J., Vassena, R., Popovic, M., and Arcos, J. L. (2024). Idoser: Improving individualized dosing policies with clinical practice and machine learning. Expert Systems with Applications, 238:121796.
Darbyshire, J., Foulkes, M., Peto, R., Duncan, W., Babiker, A., Collins,
R., Hughes, M., Peto, T. E., Walker, S. A., and Group, C. H. (1996). Zidovudine (azt) versus azt plus didanosine (ddi) versus azt plus zalcitabine (ddc) in hiv infected adults. Cochrane database of systematic reviews, 2010(3).
Ding, P. and Li, F. (2018). Causal inference: a missing data perspective. Statistical Science, 33:214–237.
Fan, C., Lu, W., Song, R., and Zhou, Y. (2017). Concordance-assisted learning for estimating optimal individualized treatment regimes. Journal of the Royal Statistical Society Series B: Statistical Methodology, 79(5):1565–1582.
Feng, H., Duan, J., Ning, Y., and Zhao, J. (2024). Test of significance for high-dimensional thresholds with application to individualized minimal clinically important difference. Journal of the American Statistical Association, 119(546):1396–1408.
Feng, H., Ning, Y., and Zhao, J. (2022). Nonregular and minimax estimation of individualized thresholds in high dimension with binary responses. The Annals of Statistics, 50(4):2284–2305.
Friedland, G. H., Saltzman, B., Vileno, J., Freeman, K., Schrager, L. K.,
and Klein, R. S. (1991). Survival differences in patients with aids. JAIDS Journal of Acquired Immune Deficiency Syndromes, 4(2):144–153.
Gunn, K., Lu, W., and Song, R. (2024). Adaptive semi-supervised inference for optimal treatment decisions with electronic medical record data. In Statistics in Precision Health: Theory, Methods and Applications, pages 229–246. Springer.
Hammer, S. M., Katzenstein, D. A., Hughes, M. D., Gundacker, H., Schooley, R. T., Haubrich, R. H., Henry, W. K., Lederman, M. M., Phair,
J. P., Niu, M., et al. (1996). A trial comparing nucleoside monotherapy with combination therapy in hiv-infected adults with cd4 cell counts from 200 to 500 per cubic millimeter. New England Journal of Medicine, 335(15):1081–1090.
Jin, Z., Ying, Z., and Wei, L. J. (2001). A simple resampling method by perturbing the minimand. Biometrika, 88(2):381–390.
Keoshkerian, E., Ashton, L. J., Smith, D. G., Ziegler, J. B., Kaldor, J. M.,
Cooper, D. A., Stewart, G. J., and Ffrench, R. A. (2003). Effector hivspecific cytotoxic t-lymphocyte activity in long-term nonprogressors: Associations with viral replication and progression. Journal of medical virology, 71(4):483–491.
Kitagawa, T. and Tetenov, A. (2018). Who should be treated? empirical welfare maximization methods for treatment choice. Econometrica, 86:591–616.
Langford, S. E., Ananworanich, J., and Cooper, D. A. (2007). Predictors of disease progression in hiv infection: a review. AIDS research and therapy, 4:1–14.
Liao, K. P., Cai, T., Gainer, V., Goryachev, S., Zeng-treitler, Q., Raychaudhuri, S., Szolovits, P., Churchill, S., Murphy, S., Kohane, I., et al. (2010). Electronic medical records for discovery research in rheumatoid arthritis. Arthritis care & research, 62(8):1120–1127.
Liu, Y., Wang, Y., Kosorok, M., Zhao, Y.-Q., and Zeng, D. (2018). Augmented outcome-weighted learning for estimating optimal dynamic treatment regimens. Statistics in Medicine, 37(22):3776–3788.
Lu, W., Zhang, H., and Zeng, D. (2013). Variable selection for optimal treatment decision. Statistical Methods in Medical Research, 22(5):493– 504.
Manski, C. F. (2004). Statistical treatment rules for heterogeneous populations. Econometrica, 72(4):1221–1246.
Mauss, S., Adams, O., Willers, R., and Jablonowski, H. (1996). Combination therapy with zdv+ ddi versus zdv+ ddc in patients with progression of hiv-infection under treatment with zdv. JAIDS Journal of Acquired Immune Deficiency Syndromes, 11(5):469–477.
Mo, W. and Liu, Y. (2022). Efficient learning of optimal individualized treatment rules for heteroscedastic or misspecified treatment-free effect models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(2):440–472.
Murphy, S. A. (2003). Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(2):331– 355.
Ogg, G. S., Jin, X., Bonhoeffer, S., Dunbar, P. R., Nowak, M. A., Monard, S., Segal, J. P., Cao, Y., Rowland-Jones, S. L., Cerundolo, V., et al.
(1998). Quantitation of hiv-1-specific cytotoxic t lymphocytes and plasma load of viral rna. Science, 279(5359):2103–2106.
Peng, L. and Huang, Y. (2008). Survival analysis with quantile regression models. Journal of the American Statistical Association, 103(482):637– 649.
Phillips, A. N. and Lundgren, J. D. (2006). The cd4 lymphocyte count and risk of clinical progression. Current opinion in HIV and AIDS, 1(1):43– 49.
Qian, M. and Murphy, S. A. (2011). Performance guarantees for individualized treatment rules. Annals of statistics, 39(2):1180.
Ragni, M. V., Amato, D. A., LoFaro, M. L., DeGruttola, V., Van Der Horst, C., Eyster, M. E., Kessler, C. M., Gjerset, G. F., Ho, M., Parenti, D. M.,
et al. (1995). Randomized study of didanosine monotherapy and combination therapy with zidovudine in hemophilic and nonhemophilic subjects with asymptomatic human immunodeficiency virus-1 infection. Blood, 85(9):2337–2346.
Robins, J. M. (2004). Optimal structural nested models for optimal sequential decisions. In Lin, D. Y. and Heagerty, P. J., editors, Proceedings of the Second Seattle Symposium in Biostatistics, volume 179 of Lecture Notes in Statistics, pages 189–326, New York, NY, USA. Springer.
Robins, J. M., Hern´an, M. A., and Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. American journal of epidemiology, 152(4):327–333.
Robins, J. M., Rotnitzky, A., and Zhao, L. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89:846–866.
Robins, J. M., Rotnitzky, A., and Zhao, L. (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association, 90:106–121.
Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of educational Psychology, 66(5):688.
Schoenbaum, E. E., Hartel, D., and Friedland, G. (1990). Hiv infection and intravenous drug use. Current Opinion in Infectious Diseases, 3(1):80–93.
Shi, C., Fan, A., Song, R., and Lu, W. (2018). High-dimensional a-learning for optimal dynamic treatment regimes. The Annals of Statistics, 46:925– 957.
Sonabend-W, A., Laha, N., Ananthakrishnan, A. N., Cai, T., and Mukherjee, R. (2023). Semi-supervised off-policy reinforcement learning and value estimation for dynamic treatment regimes. Journal of Machine Learning Research, 24(323):1–86.
Song, S., Lin, Y., and Zhou, Y. (2023). A general m-estimation theory in semi-supervised framework. Journal of the American Statistical Association, pages 1–11.
Wang, L., Zhou, Y., Song, R., and Sherwood, B. (2018). Quantile-optimal treatment regimes. Journal of the American Statistical Association, 113(523):1243–1254.
Wang, Y., Zhou, Q., Cai, T., and Wang, X. (2023). Semi-supervised estimation of event rate with doubly-censored survival data. arXiv preprint arXiv:2311.02574.
Watkins, C. and Dayan, P. (1992). Q-learning. Machine Learning, 8(3):279– 292.
Watkins, C. J. H. (1989). Learning from delayed rewards. PhD thesis, King’s College.
Zhang, A., Brown, L. D., and Cai, T. T. (2019). Semi-supervised inference: General theory and estimation of means. The Annals of Statistics, 47(5):2538 – 2566.
Zhang, B., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2012). A robust method for estimating optimal treatment regimes. Biometrics, 68:1010– 1018. Zhang, Y. and Imai,
K. (2023). Individualized policy evaluation and learning under clustered network interference. arXiv preprint arXiv:2311.02467.
Zhao, Y.-Q., Laber, E., Ning, Y., Saha, S., and Sands, B. (2019). Efficient augmentation and relaxation learning for individualized treatment rules using observational data. Journal of Machine Learning Research, 20:1– 23.
Zhao, Y.-Q., Zeng, D., Rush, A., and Kosorok, M. (2012). Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association, 107:1106–1118.
Zhou, X., Mayer-Hamblett, N., Khan, U., and Kosorok, M. (2017). Residual weighted learning for estimating individualized treatment rules. Journal of the American Statistical Association, 112:169–187. Xintong Li

Acknowledgments

The authors thank the co-editor, the associate editor and reviewers for

their helpful suggestions. Zhou and Peng’ s work was supported by the National Key R&D Program of China (2021YFA1000100, 2021YFA1000101

and 2021YFA1000104) and Shanghai Key Program of Computational Biology (23JS1400500). Zhou’ s work was supported by the National Natural

Science Foundation of China (71931004). Peng’ s work was supported by

the National Natural Science Foundation of China (12301337, 72331005).

Supplementary Materials

The online Supplementary Material contains additional asymptotic results,

theoretical proofs and additional numerical descriptions and results.

Supplementary materials are available for download.

[1] Aggarwal, C. C. (2016). Recommender Systems: The Textbook. Springer, Cham.

[2] Athey, S., Tibshirani, J., and Wager, S. (2019). Generalized random forests. The Annals of Statistics, 47:1148–1178.

[3] Athey, S. and Wager, S. (2021). Policy learning with observational data. Econometrica, 89:133–161.

[4] Blatt, D., Murphy, S. A., and Zhu, J. (2004). A-learning for approximate planning. Ann Arbor, 1001:48109–2122. Carr´e, N., Deveau, C., Belanger, F., Boufassa, F., Persoz, A., Jadand, C.,

[5] Rouzioux, C., Delfraissy, J.-F., Bucquet, D., Group, S. S., et al. (1994). Effect of age and exposure group on the onset of aids in heterosexual and homosexual hiv-infected patients. Aids, 8(6):797–802.

[6] Chakrabortty, A. and Cai, T. (2018). Efficient and adaptive linear regression in semi-supervised settings. Ann. Statist., 46:1541–1572.

[7] Chakrabortty, A., Dai, G., and Tchetgen, E. T. (2022). A general framework for treatment effect estimation in semi-supervised and high dimensional settings. arXiv preprint arXiv:2201.00468.

[8] Chakraborty, B., Murphy, S., and Strecher, V. (2010). Inference for nonregular parameters in optimal dynamic treatment regimes. Statistical methods in medical research, 19(3):317–343.

[9] Chapelle, O., Scholkopf, B., and Zien, A. (2006). Semi-supervised learning. 2006. Cambridge, Massachusettes: The MIT Press View Article, 2:1.

[10] Cheng, D., Ananthakrishnan, A. N., and Cai, T. (2021). Robust and efficient semi-supervised estimation of average treatment effects with application to electronic health records data. Biometrics, 77(2):413–423.

[11] Chu, J., Lu, W., and Yang, S. (2023). Targeted optimal treatment regime learning using summary statistics. Biometrika, 110(4):913–931.

[12] Correa, N., Cerquides, J., Vassena, R., Popovic, M., and Arcos, J. L. (2024). Idoser: Improving individualized dosing policies with clinical practice and machine learning. Expert Systems with Applications, 238:121796.

[13] Darbyshire, J., Foulkes, M., Peto, R., Duncan, W., Babiker, A., Collins,

[14] R., Hughes, M., Peto, T. E., Walker, S. A., and Group, C. H. (1996). Zidovudine (azt) versus azt plus didanosine (ddi) versus azt plus zalcitabine (ddc) in hiv infected adults. Cochrane database of systematic reviews, 2010(3).

[15] Ding, P. and Li, F. (2018). Causal inference: a missing data perspective. Statistical Science, 33:214–237.

[16] Fan, C., Lu, W., Song, R., and Zhou, Y. (2017). Concordance-assisted learning for estimating optimal individualized treatment regimes. Journal of the Royal Statistical Society Series B: Statistical Methodology, 79(5):1565–1582.

[17] Feng, H., Duan, J., Ning, Y., and Zhao, J. (2024). Test of significance for high-dimensional thresholds with application to individualized minimal clinically important difference. Journal of the American Statistical Association, 119(546):1396–1408.

[18] Feng, H., Ning, Y., and Zhao, J. (2022). Nonregular and minimax estimation of individualized thresholds in high dimension with binary responses. The Annals of Statistics, 50(4):2284–2305.

[19] Friedland, G. H., Saltzman, B., Vileno, J., Freeman, K., Schrager, L. K.,

[20] and Klein, R. S. (1991). Survival differences in patients with aids. JAIDS Journal of Acquired Immune Deficiency Syndromes, 4(2):144–153.

[21] Gunn, K., Lu, W., and Song, R. (2024). Adaptive semi-supervised inference for optimal treatment decisions with electronic medical record data. In Statistics in Precision Health: Theory, Methods and Applications, pages 229–246. Springer.

[22] Hammer, S. M., Katzenstein, D. A., Hughes, M. D., Gundacker, H., Schooley, R. T., Haubrich, R. H., Henry, W. K., Lederman, M. M., Phair,

[23] J. P., Niu, M., et al. (1996). A trial comparing nucleoside monotherapy with combination therapy in hiv-infected adults with cd4 cell counts from 200 to 500 per cubic millimeter. New England Journal of Medicine, 335(15):1081–1090.

[24] Jin, Z., Ying, Z., and Wei, L. J. (2001). A simple resampling method by perturbing the minimand. Biometrika, 88(2):381–390.

[25] Keoshkerian, E., Ashton, L. J., Smith, D. G., Ziegler, J. B., Kaldor, J. M.,

[26] Cooper, D. A., Stewart, G. J., and Ffrench, R. A. (2003). Effector hivspecific cytotoxic t-lymphocyte activity in long-term nonprogressors: Associations with viral replication and progression. Journal of medical virology, 71(4):483–491.

[27] Kitagawa, T. and Tetenov, A. (2018). Who should be treated? empirical welfare maximization methods for treatment choice. Econometrica, 86:591–616.

[28] Langford, S. E., Ananworanich, J., and Cooper, D. A. (2007). Predictors of disease progression in hiv infection: a review. AIDS research and therapy, 4:1–14.

[29] Liao, K. P., Cai, T., Gainer, V., Goryachev, S., Zeng-treitler, Q., Raychaudhuri, S., Szolovits, P., Churchill, S., Murphy, S., Kohane, I., et al. (2010). Electronic medical records for discovery research in rheumatoid arthritis. Arthritis care & research, 62(8):1120–1127.

[30] Liu, Y., Wang, Y., Kosorok, M., Zhao, Y.-Q., and Zeng, D. (2018). Augmented outcome-weighted learning for estimating optimal dynamic treatment regimens. Statistics in Medicine, 37(22):3776–3788.

[31] Lu, W., Zhang, H., and Zeng, D. (2013). Variable selection for optimal treatment decision. Statistical Methods in Medical Research, 22(5):493– 504.

[32] Manski, C. F. (2004). Statistical treatment rules for heterogeneous populations. Econometrica, 72(4):1221–1246.

[33] Mauss, S., Adams, O., Willers, R., and Jablonowski, H. (1996). Combination therapy with zdv+ ddi versus zdv+ ddc in patients with progression of hiv-infection under treatment with zdv. JAIDS Journal of Acquired Immune Deficiency Syndromes, 11(5):469–477.

[34] Mo, W. and Liu, Y. (2022). Efficient learning of optimal individualized treatment rules for heteroscedastic or misspecified treatment-free effect models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(2):440–472.

[35] Murphy, S. A. (2003). Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(2):331– 355.

[36] Ogg, G. S., Jin, X., Bonhoeffer, S., Dunbar, P. R., Nowak, M. A., Monard, S., Segal, J. P., Cao, Y., Rowland-Jones, S. L., Cerundolo, V., et al.

[37] (1998). Quantitation of hiv-1-specific cytotoxic t lymphocytes and plasma load of viral rna. Science, 279(5359):2103–2106.

[38] Peng, L. and Huang, Y. (2008). Survival analysis with quantile regression models. Journal of the American Statistical Association, 103(482):637– 649.

[39] Phillips, A. N. and Lundgren, J. D. (2006). The cd4 lymphocyte count and risk of clinical progression. Current opinion in HIV and AIDS, 1(1):43– 49.

[40] Qian, M. and Murphy, S. A. (2011). Performance guarantees for individualized treatment rules. Annals of statistics, 39(2):1180.

[41] Ragni, M. V., Amato, D. A., LoFaro, M. L., DeGruttola, V., Van Der Horst, C., Eyster, M. E., Kessler, C. M., Gjerset, G. F., Ho, M., Parenti, D. M.,

[42] et al. (1995). Randomized study of didanosine monotherapy and combination therapy with zidovudine in hemophilic and nonhemophilic subjects with asymptomatic human immunodeficiency virus-1 infection. Blood, 85(9):2337–2346.

[43] Robins, J. M. (2004). Optimal structural nested models for optimal sequential decisions. In Lin, D. Y. and Heagerty, P. J., editors, Proceedings of the Second Seattle Symposium in Biostatistics, volume 179 of Lecture Notes in Statistics, pages 189–326, New York, NY, USA. Springer.

[44] Robins, J. M., Hern´an, M. A., and Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. American journal of epidemiology, 152(4):327–333.

[45] Robins, J. M., Rotnitzky, A., and Zhao, L. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89:846–866.

[46] Robins, J. M., Rotnitzky, A., and Zhao, L. (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association, 90:106–121.

[47] Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of educational Psychology, 66(5):688.

[48] Schoenbaum, E. E., Hartel, D., and Friedland, G. (1990). Hiv infection and intravenous drug use. Current Opinion in Infectious Diseases, 3(1):80–93.

[49] Shi, C., Fan, A., Song, R., and Lu, W. (2018). High-dimensional a-learning for optimal dynamic treatment regimes. The Annals of Statistics, 46:925– 957.

[50] Sonabend-W, A., Laha, N., Ananthakrishnan, A. N., Cai, T., and Mukherjee, R. (2023). Semi-supervised off-policy reinforcement learning and value estimation for dynamic treatment regimes. Journal of Machine Learning Research, 24(323):1–86.

[51] Song, S., Lin, Y., and Zhou, Y. (2023). A general m-estimation theory in semi-supervised framework. Journal of the American Statistical Association, pages 1–11.

[52] Wang, L., Zhou, Y., Song, R., and Sherwood, B. (2018). Quantile-optimal treatment regimes. Journal of the American Statistical Association, 113(523):1243–1254.

[53] Wang, Y., Zhou, Q., Cai, T., and Wang, X. (2023). Semi-supervised estimation of event rate with doubly-censored survival data. arXiv preprint arXiv:2311.02574.

[54] Watkins, C. and Dayan, P. (1992). Q-learning. Machine Learning, 8(3):279– 292.

[55] Watkins, C. J. H. (1989). Learning from delayed rewards. PhD thesis, King’s College.

[56] Zhang, A., Brown, L. D., and Cai, T. T. (2019). Semi-supervised inference: General theory and estimation of means. The Annals of Statistics, 47(5):2538 – 2566.

[57] Zhang, B., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2012). A robust method for estimating optimal treatment regimes. Biometrics, 68:1010– 1018. Zhang, Y. and Imai,

[58] K. (2023). Individualized policy evaluation and learning under clustered network interference. arXiv preprint arXiv:2311.02467.

[59] Zhao, Y.-Q., Laber, E., Ning, Y., Saha, S., and Sands, B. (2019). Efficient augmentation and relaxation learning for individualized treatment rules using observational data. Journal of Machine Learning Research, 20:1– 23.

[60] Zhao, Y.-Q., Zeng, D., Rush, A., and Kosorok, M. (2012). Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association, 107:1106–1118.

[61] Zhou, X., Mayer-Hamblett, N., Khan, U., and Kosorok, M. (2017). Residual weighted learning for estimating individualized treatment rules. Journal of the American Statistical Association, 112:169–187. Xintong Li