Abstract

In policy learning, the goal is typically to optimize a primary performance metric,

but other subsidiary metrics often also warrant attention. This paper presents

two strategies for evaluating these subsidiary metrics under a policy that is optimal for the primary one.

The first relies on a novel margin condition that

facilitates Wald-type inference. Under this and other regularity conditions, we

show that the one-step corrected estimator is efficient. Despite the utility of this

margin condition, it places strong restrictions on how the subsidiary metric behaves for nearly-optimal policies, which may not hold in practice. We therefore

introduce alternative, two-stage strategies that do not require a margin condition. The first stage constructs a set of candidate policies and the second builds

a uniform confidence interval over this set. We provide numerical simulations to

evaluate the performance of these methods in di↵erent scenarios.

Information

Preprint No.SS-2024-0013
Manuscript IDSS-2024-0013
Complete AuthorsZhaoqi Li, Houssam Nassif, Alex Luedtke
Corresponding AuthorsZhaoqi Li
Emailszli9@stanford.edu

References

  1. P. Afeche, M. Araghi, and O. Baron. Customer acquisition, retention, and service access quality:
  2. Optimal advertising, capacity level, and capacity allocation. Manuf. Serv. Oper. Manag., 19(4):674–691, 2017.
  3. S. Athey and S. Wager. Policy learning with observational data. Econometrica, 89(1):133–161, 2021.
  4. J.-Y. Audibert and A. B. Tsybakov. Fast learning rates for plug-in classifiers. Ann. Stat., 35
  5. (2):608–633, 2007.
  6. D. Benkeser and M. Van Der Laan. The highly adaptive lasso estimator. In International conference on data science and advanced analytics (DSAA), pages 689–696, 2016.
  7. R. L. Berger and J. C. Hsu.
  8. Bioequivalence trials, intersection-union tests and equivalence confidence sets. Statistical Science, 11(4):283–319, 1996.
  9. I. Bica, A. M. Alaa, C. Lambert, and M. Van Der Schaar. From real-world patient data to individualized treatment e↵ects using machine learning: current and future methods to address underlying challenges. Clin. Pharmacol. Ther., 109(1):87–100, 2021.
  10. S. Boominathan, M. Oberst, H. Zhou, S. Kanjilal, and D. Sontag. Treatment policy learning in multiobjective settings with fully observed outcomes. In Proc. 26th ACM SIGKDD Int.
  11. Conf. Knowl. Discov. & Data Min., pages 1937–1947, 2020.
  12. E. L. Butler, E. B. Laber, S. M. Davis, and M. R. Kosorok. Incorporating patient p into estimation of optimal individualized treatment rules. Biometrics, 74(1):18–26, 2018.
  13. B. Chakraborty, E. B. Laber, and Y. Zhao. Inference for optimal dynamic treatment regimes using an adaptive m-out-of-n bootstrap scheme. Biometrics, 69(3):714–723, 2013.
  14. A. Chambaz, W. Zheng, and M. J. van der Laan. Targeted sequential design for targeted learning inference of the optimal treatment rule and its mean reward. Ann. Stat., 45(6):
  15. 2537–2564, 2017.
  16. V. Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, and J. Robins.
  17. Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1):C1–C68, 2018.
  18. K. Deb. Multi-objective optimization. In Search methodologies, pages 403–449. Springer, 2014.
  19. M. Dud´ık, J. Langford, and L. Li.
  20. Doubly robust policy evaluation and learning. In The
  21. International Conference on Machine Learning (ICML), pages 1097–1104, 2011.
  22. B. Fang, A. Guntuboyina, and B. Sen. Multivariate extensions of isotonic regression and total variation denoising via entire monotonicity and Hardy–Krause variation. The Annals of Statistics, 49(2):769–792, 2021.
  23. FDA. Guidance for industry: Adverse reactions section of labeling for human prescription drug and biological products – content and format, 2006. URL https://www.fda.gov/media/
  24. 72139/download.
  25. T. Fiez, H. Nassif, Y.-C. Chen, S. Gamez, and L. Jain. Best of three worlds: Adaptive experimentation for digital marketing in practice. In The Web Conference (WWW), pages
  26. 3586–3597, 2024.
  27. N. Freemantle, M. Calvert, J. Wood, J. Eastaugh, and C. Griffin. Composite outcomes in randomized trials: greater precision but with greater uncertainty? JAMA, 19(289):2554–
  28. 2559, 2003.
  29. N. Gunantara. A review of multi-objective optimization: Methods and its applications. Cogent Engineering, 5(1):1502242, 2018.
  30. E. H. Kennedy.
  31. Towards optimal doubly robust estimation of heterogeneous causal e↵ects.
  32. Electronic Journal of Statistics, 17(2):3008–3049, 2023.
  33. E. H. Kennedy. Semiparametric doubly robust targeted double machine learning: a review.
  34. Handbook of Statistical Methods for Precision Medicine, pages 207–236, 2024.
  35. E. B. Laber, D. J. Lizotte, and B. Ferguson. Set-valued dynamic treatment regimes for competing outcomes. Biometrics, 70(1):53–61, 2014.
  36. Y. Ling, P. Upadhyaya, L. Chen, X. Jiang, and Y. Kim. Emulate randomized clinical trials using heterogeneous treatment e↵ect estimation for personalized treatments: Methodology review and benchmark. Journal of Biomedical Informatics, 137:104256, 2023.
  37. K. A. Linn, E. B. Laber, and L. A. Stefanski. Estimation of dynamic treatment regimes for complex outcomes: balancing benefits and risks, chapter 15, pages 249–262. SIAM, 2015.
  38. L. Liu, Z. Shahn, J. M. Robins, and A. Rotnitzky. Efficient estimation of optimal regimes under a no direct e↵ect assumption. J. Am. Stat. Assoc., 116(533):224–239, 2021.
  39. M. Liu, Y. Wang, H. Fu, and D. Zeng. Learning optimal dynamic treatment regimens subject to stagewise risk controls. Journal of Machine Learning Research, 25(128):1–64, 2024.
  40. D. J. Luckett, E. B. Laber, S. Kim, and M. R. Kosorok. Estimation and optimization of composite outcomes. Journal of Machine Learning Research, 22(167):1–40, 2021.
  41. A. Luedtke and A. Chambaz. Performance guarantees for policy learning. Annales de l’Institut
  42. Henri Poincar´e, Probabilit´es et Statistiques, 56(3):2162–2188, 2020.
  43. A. R. Luedtke and M. J. Van Der Laan. Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy. Ann. Stat., 44(2):713–742, 2016.
  44. S. A. Murphy. Optimal dynamic treatment regimes. Journal of the Royal Statistical Society:
  45. Series B (Statistical Methodology), 65(2):331–355, 2003.
  46. T. A. Murray, P. F. Thall, and Y. Yuan. Utility-based designs for randomized comparative trials with categorical outcomes. Statistics in medicine, 35(24):4285–4305, 2016.
  47. X. Nie and S. Wager. Quasi-oracle estimation of heterogeneous treatment e↵ects. Biometrika, 108(2):299–319, 2021.
  48. J. Pfanzagl. Contributions to a general asymptotic statistical theory, volume 13 of Lecture notes in statistics. Springer, 1982.
  49. R. Phillips, O. Sauzet, and V. Cornelius. Statistical methods for the analysis of adverse event data in randomised controlled trials: a scoping review and taxonomy. BMC medical research methodology, 20(1):1–13, 2020.
  50. M. Qian and S. A. Murphy. Performance guarantees for individualized treatment rules. Ann. Stat., 39(2):1180, 2011.
  51. J. M. Robins. Optimal structural nested models for optimal sequential decisions. In Proceedings of the second seattle Symposium in Biostatistics, pages 189–326. Springer, 2004.
  52. A. Schick. On asymptotically efficient estimation in semiparametric models. Ann. Stat., 14(3):
  53. 1139–1151, 1986.
  54. C. Shi, S. Zhang, W. Lu, and R. Song. Statistical inference of the value function for reinforcement learning in infinite horizon settings.
  55. Journal of the Royal Statistical Society Series B:
  56. Statistical Methodology, 84(3):765–793, 2021.
  57. M. J. van der Laan and A. R. Luedtke. Targeted learning of the mean outcome under an optimal dynamic treatment rule. Journal of causal inference, 3(1):61–95, 2015.
  58. M. J. Van Der Laan and D. Rubin. Targeted maximum likelihood learning. The international journal of biostatistics, 2(1):1–38, 2006.
  59. A. W. Van der Vaart. Asymptotic statistics, volume 3. Cambridge University Press, 2000.
  60. A. W. Van Der Vaart and J. A. Wellner.
  61. Weak convergence and empirical processes: with applications to statistics. Springer, 2013.
  62. Y. Wang, H. Fu, and D. Zeng. Learning optimal personalized treatment rules in consideration of benefit and risk: with an application to treating type 2 diabetes patients with insulin therapies. Journal of the American Statistical Association, 113(521):1–13, 2018.
  63. J. Weltz, T. Fiez, A. Volfovsky, E. Laber, B. Mason, H. Nassif, and L. Jain. Experimental designs for heteroskedastic variance.
  64. In Conference on Neural Information Processing
  65. Systems (NeurIPS), pages 65967–66005, 2023.
  66. B. Zhang, A. A. Tsiatis, E. B. Laber, and M. Davidian. Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. Biometrika, 100(3):681–694, 2013.
  67. Y. Zhao, D. Zeng, A. J. Rush, and M. R. Kosorok. Estimating individualized treatment rules using outcome weighted learning. J. Am. Stat. Assoc., 107(499):1106–1118, 2012.

Acknowledgments

This work was supported by National Institutes of Health award DP2-

LM013340, and National Science Foundation award DMS-2210216.

Supplementary Materials

The online Supplementary Material contains proofs of main theorems and

lemmas, and additional experiments.


Supplementary materials are available for download.