Back To Index Previous Article Next Article Full Text

Statistica Sinica 25 (2015), 901-920

PENALIZED Q-LEARNING FOR DYNAMIC
TREATMENT REGIMENS
Rui Song1, Weiwei Wang2, Donglin Zeng3 and Michael R. Kosorok3
1North Carolina State University, 2Two Sigma Investment
and 3University of North Carolina, Chapel Hill

Abstract: A dynamic treatment regimen incorporates both accrued information and long-term effects of treatment from specially designed clinical trials. As these trials become more and more popular in conjunction with longitudinal data from clinical studies, the development of statistical inference for optimal dynamic treatment regimens is a high priority. In this paper, we propose a new machine learning framework called penalized Q-learning, under which valid statistical inference is established. We also propose a new statistical procedure: individual selection and corresponding methods for incorporating individual selection within penalized Q-learning. Extensive numerical studies are presented which compare the proposed methods with existing methods, under a variety of scenarios, and demonstrate that the proposed approach is both inferentially and computationally superior. It is illustrated with a depression clinical trial study.

Key words and phrases: Dynamic treatment regimen, individual selection, multi-stage, penalized Q-learning, Q-learning, shrinkage, two-stage procedure.

Back To Index Previous Article Next Article Full Text