Abstract
In policy learning, the goal is typically to optimize a primary performance metric,
but other subsidiary metrics often also warrant attention. This paper presents
two strategies for evaluating these subsidiary metrics under a policy that is optimal for the primary one.
The first relies on a novel margin condition that
facilitates Wald-type inference. Under this and other regularity conditions, we
show that the one-step corrected estimator is efficient. Despite the utility of this
margin condition, it places strong restrictions on how the subsidiary metric behaves for nearly-optimal policies, which may not hold in practice. We therefore
introduce alternative, two-stage strategies that do not require a margin condition. The first stage constructs a set of candidate policies and the second builds
a uniform confidence interval over this set. We provide numerical simulations to
evaluate the performance of these methods in di↵erent scenarios.
Information
| Preprint No. | SS-2024-0013 |
|---|---|
| Manuscript ID | SS-2024-0013 |
| Complete Authors | Zhaoqi Li, Houssam Nassif, Alex Luedtke |
| Corresponding Authors | Zhaoqi Li |
| Emails | zli9@stanford.edu |
References
- P. Afeche, M. Araghi, and O. Baron. Customer acquisition, retention, and service access quality:
- Optimal advertising, capacity level, and capacity allocation. Manuf. Serv. Oper. Manag., 19(4):674–691, 2017.
- S. Athey and S. Wager. Policy learning with observational data. Econometrica, 89(1):133–161, 2021.
- J.-Y. Audibert and A. B. Tsybakov. Fast learning rates for plug-in classifiers. Ann. Stat., 35
- (2):608–633, 2007.
- D. Benkeser and M. Van Der Laan. The highly adaptive lasso estimator. In International conference on data science and advanced analytics (DSAA), pages 689–696, 2016.
- R. L. Berger and J. C. Hsu.
- Bioequivalence trials, intersection-union tests and equivalence confidence sets. Statistical Science, 11(4):283–319, 1996.
- I. Bica, A. M. Alaa, C. Lambert, and M. Van Der Schaar. From real-world patient data to individualized treatment e↵ects using machine learning: current and future methods to address underlying challenges. Clin. Pharmacol. Ther., 109(1):87–100, 2021.
- S. Boominathan, M. Oberst, H. Zhou, S. Kanjilal, and D. Sontag. Treatment policy learning in multiobjective settings with fully observed outcomes. In Proc. 26th ACM SIGKDD Int.
- Conf. Knowl. Discov. & Data Min., pages 1937–1947, 2020.
- E. L. Butler, E. B. Laber, S. M. Davis, and M. R. Kosorok. Incorporating patient p into estimation of optimal individualized treatment rules. Biometrics, 74(1):18–26, 2018.
- B. Chakraborty, E. B. Laber, and Y. Zhao. Inference for optimal dynamic treatment regimes using an adaptive m-out-of-n bootstrap scheme. Biometrics, 69(3):714–723, 2013.
- A. Chambaz, W. Zheng, and M. J. van der Laan. Targeted sequential design for targeted learning inference of the optimal treatment rule and its mean reward. Ann. Stat., 45(6):
- 2537–2564, 2017.
- V. Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, and J. Robins.
- Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1):C1–C68, 2018.
- K. Deb. Multi-objective optimization. In Search methodologies, pages 403–449. Springer, 2014.
- M. Dud´ık, J. Langford, and L. Li.
- Doubly robust policy evaluation and learning. In The
- International Conference on Machine Learning (ICML), pages 1097–1104, 2011.
- B. Fang, A. Guntuboyina, and B. Sen. Multivariate extensions of isotonic regression and total variation denoising via entire monotonicity and Hardy–Krause variation. The Annals of Statistics, 49(2):769–792, 2021.
- FDA. Guidance for industry: Adverse reactions section of labeling for human prescription drug and biological products – content and format, 2006. URL https://www.fda.gov/media/
- 72139/download.
- T. Fiez, H. Nassif, Y.-C. Chen, S. Gamez, and L. Jain. Best of three worlds: Adaptive experimentation for digital marketing in practice. In The Web Conference (WWW), pages
- 3586–3597, 2024.
- N. Freemantle, M. Calvert, J. Wood, J. Eastaugh, and C. Griffin. Composite outcomes in randomized trials: greater precision but with greater uncertainty? JAMA, 19(289):2554–
- 2559, 2003.
- N. Gunantara. A review of multi-objective optimization: Methods and its applications. Cogent Engineering, 5(1):1502242, 2018.
- E. H. Kennedy.
- Towards optimal doubly robust estimation of heterogeneous causal e↵ects.
- Electronic Journal of Statistics, 17(2):3008–3049, 2023.
- E. H. Kennedy. Semiparametric doubly robust targeted double machine learning: a review.
- Handbook of Statistical Methods for Precision Medicine, pages 207–236, 2024.
- E. B. Laber, D. J. Lizotte, and B. Ferguson. Set-valued dynamic treatment regimes for competing outcomes. Biometrics, 70(1):53–61, 2014.
- Y. Ling, P. Upadhyaya, L. Chen, X. Jiang, and Y. Kim. Emulate randomized clinical trials using heterogeneous treatment e↵ect estimation for personalized treatments: Methodology review and benchmark. Journal of Biomedical Informatics, 137:104256, 2023.
- K. A. Linn, E. B. Laber, and L. A. Stefanski. Estimation of dynamic treatment regimes for complex outcomes: balancing benefits and risks, chapter 15, pages 249–262. SIAM, 2015.
- L. Liu, Z. Shahn, J. M. Robins, and A. Rotnitzky. Efficient estimation of optimal regimes under a no direct e↵ect assumption. J. Am. Stat. Assoc., 116(533):224–239, 2021.
- M. Liu, Y. Wang, H. Fu, and D. Zeng. Learning optimal dynamic treatment regimens subject to stagewise risk controls. Journal of Machine Learning Research, 25(128):1–64, 2024.
- D. J. Luckett, E. B. Laber, S. Kim, and M. R. Kosorok. Estimation and optimization of composite outcomes. Journal of Machine Learning Research, 22(167):1–40, 2021.
- A. Luedtke and A. Chambaz. Performance guarantees for policy learning. Annales de l’Institut
- Henri Poincar´e, Probabilit´es et Statistiques, 56(3):2162–2188, 2020.
- A. R. Luedtke and M. J. Van Der Laan. Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy. Ann. Stat., 44(2):713–742, 2016.
- S. A. Murphy. Optimal dynamic treatment regimes. Journal of the Royal Statistical Society:
- Series B (Statistical Methodology), 65(2):331–355, 2003.
- T. A. Murray, P. F. Thall, and Y. Yuan. Utility-based designs for randomized comparative trials with categorical outcomes. Statistics in medicine, 35(24):4285–4305, 2016.
- X. Nie and S. Wager. Quasi-oracle estimation of heterogeneous treatment e↵ects. Biometrika, 108(2):299–319, 2021.
- J. Pfanzagl. Contributions to a general asymptotic statistical theory, volume 13 of Lecture notes in statistics. Springer, 1982.
- R. Phillips, O. Sauzet, and V. Cornelius. Statistical methods for the analysis of adverse event data in randomised controlled trials: a scoping review and taxonomy. BMC medical research methodology, 20(1):1–13, 2020.
- M. Qian and S. A. Murphy. Performance guarantees for individualized treatment rules. Ann. Stat., 39(2):1180, 2011.
- J. M. Robins. Optimal structural nested models for optimal sequential decisions. In Proceedings of the second seattle Symposium in Biostatistics, pages 189–326. Springer, 2004.
- A. Schick. On asymptotically efficient estimation in semiparametric models. Ann. Stat., 14(3):
- 1139–1151, 1986.
- C. Shi, S. Zhang, W. Lu, and R. Song. Statistical inference of the value function for reinforcement learning in infinite horizon settings.
- Journal of the Royal Statistical Society Series B:
- Statistical Methodology, 84(3):765–793, 2021.
- M. J. van der Laan and A. R. Luedtke. Targeted learning of the mean outcome under an optimal dynamic treatment rule. Journal of causal inference, 3(1):61–95, 2015.
- M. J. Van Der Laan and D. Rubin. Targeted maximum likelihood learning. The international journal of biostatistics, 2(1):1–38, 2006.
- A. W. Van der Vaart. Asymptotic statistics, volume 3. Cambridge University Press, 2000.
- A. W. Van Der Vaart and J. A. Wellner.
- Weak convergence and empirical processes: with applications to statistics. Springer, 2013.
- Y. Wang, H. Fu, and D. Zeng. Learning optimal personalized treatment rules in consideration of benefit and risk: with an application to treating type 2 diabetes patients with insulin therapies. Journal of the American Statistical Association, 113(521):1–13, 2018.
- J. Weltz, T. Fiez, A. Volfovsky, E. Laber, B. Mason, H. Nassif, and L. Jain. Experimental designs for heteroskedastic variance.
- In Conference on Neural Information Processing
- Systems (NeurIPS), pages 65967–66005, 2023.
- B. Zhang, A. A. Tsiatis, E. B. Laber, and M. Davidian. Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. Biometrika, 100(3):681–694, 2013.
- Y. Zhao, D. Zeng, A. J. Rush, and M. R. Kosorok. Estimating individualized treatment rules using outcome weighted learning. J. Am. Stat. Assoc., 107(499):1106–1118, 2012.
Acknowledgments
This work was supported by National Institutes of Health award DP2-
LM013340, and National Science Foundation award DMS-2210216.
Supplementary Materials
The online Supplementary Material contains proofs of main theorems and
lemmas, and additional experiments.