Q- and A-learning methods for estimating optimal dynamic treatment regimes

Phillip J. Schulte, Anastasios A. Tsiatis, Eric B. Laber, Marie Davidian

Research output: Contribution to journalArticlepeer-review

82 Scopus citations


In clinical practice, physicians make a series of treatment decisions over the course of a patient's disease based on his/her baseline and evolving characteristics. A dynamic treatment regime is a set of sequential decision rules that operationalizes this process. Each rule corresponds to a decision point and dictates the next treatment action based on the accrued information. Using existing data, a key goal is estimating the optimal regime, that, if followed by the patient population, would yield the most favorable outcome on average. Q- and A-learning are two main approaches for this purpose. We provide a detailed account of these methods, study their performance, and illustrate them using data from a depression study.

Original languageEnglish (US)
Pages (from-to)640-661
Number of pages22
JournalStatistical Science
Issue number4
StatePublished - 2014


  • Advantage learning
  • Bias-variance trade-off
  • Model misspecification
  • Personalized medicine
  • Potential outcomes
  • Sequential decision-making

ASJC Scopus subject areas

  • Statistics and Probability
  • Mathematics(all)
  • Statistics, Probability and Uncertainty


Dive into the research topics of 'Q- and A-learning methods for estimating optimal dynamic treatment regimes'. Together they form a unique fingerprint.

Cite this