The unreasonable effectiveness of inverse reinforcement learning in advancing cancer research

John Kalantari; Heidi Nelson; Nicholas Chia

The unreasonable effectiveness of inverse reinforcement learning in advancing cancer research

John Kalantari, Heidi Nelson, Nicholas Chia

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

The “No Free Lunch” theorem states that for any algorithm, elevated performance over one class of problems is offset by its performance over another. Stated differently, no algorithm works for everything. Instead, designing effective algorithms often means exploiting prior knowledge of data relationships specific to a given problem. This “unreasonable efficacy” is especially desirable for complex and seemingly intractable problems in the natural sciences. One such area that is rife with the need for better algorithms is cancer biology—a field where relatively few insights are being generated from relatively large amounts of data. In part, this is due to the inability of mere statistics to reflect cancer as a genetic evolutionary process—one that involves cells actively mutating in order to navigate host barriers, outcompete neighboring cells, and expand spatially. Our work is built upon the central proposition that the Markov Decision Process (MDP) can better represent the process by which cancer arises and progresses. More specifically, by encoding a cancer cell’s complex behavior as a MDP, we seek to model the series of genetic changes, or evolutionary trajectory, that leads to cancer as an optimal decision process. We posit that using an Inverse Reinforcement Learning (IRL) approach will enable us to reverse engineer an optimal policy and reward function based on a set of “expert demonstrations” extracted from the DNA of patient tumors. The inferred reward function and optimal policy can subsequently be used to extrapolate the evolutionary trajectory of any tumor. Here, we introduce a Bayesian nonparametric IRL model (PUR-IRL) where the number of reward functions is a priori unbounded in order to account for uncertainty in cancer data, i.e., the existence of latent trajectories and non-uniform sampling. We show that PUR-IRL is “unreasonably effective” in gaining interpretable and intuitive insights about cancer progression from high-dimensional genome data.

Original language	English (US)
Title of host publication	AAAI 2020 - 34th AAAI Conference on Artificial Intelligence
Publisher	AAAI press
Pages	437-445
Number of pages	9
ISBN (Electronic)	9781577358350
State	Published - 2020
Event	34th AAAI Conference on Artificial Intelligence, AAAI 2020 - New York, United States Duration: Feb 7 2020 → Feb 12 2020

Publication series

Name	AAAI 2020 - 34th AAAI Conference on Artificial Intelligence

Conference

Conference	34th AAAI Conference on Artificial Intelligence, AAAI 2020
Country/Territory	United States
City	New York
Period	2/7/20 → 2/12/20

ASJC Scopus subject areas

Artificial Intelligence

Cite this

The unreasonable effectiveness of inverse reinforcement learning in advancing cancer research. / Kalantari, John; Nelson, Heidi; Chia, Nicholas.
AAAI 2020 - 34th AAAI Conference on Artificial Intelligence. AAAI press, 2020. p. 437-445 (AAAI 2020 - 34th AAAI Conference on Artificial Intelligence).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Kalantari, J, Nelson, H & Chia, N 2020, The unreasonable effectiveness of inverse reinforcement learning in advancing cancer research. in AAAI 2020 - 34th AAAI Conference on Artificial Intelligence. AAAI 2020 - 34th AAAI Conference on Artificial Intelligence, AAAI press, pp. 437-445, 34th AAAI Conference on Artificial Intelligence, AAAI 2020, New York, United States, 2/7/20.

@inproceedings{91f4f2fd75694e779ae6298b1c59f5cd,

title = "The unreasonable effectiveness of inverse reinforcement learning in advancing cancer research",

abstract = "The “No Free Lunch” theorem states that for any algorithm, elevated performance over one class of problems is offset by its performance over another. Stated differently, no algorithm works for everything. Instead, designing effective algorithms often means exploiting prior knowledge of data relationships specific to a given problem. This “unreasonable efficacy” is especially desirable for complex and seemingly intractable problems in the natural sciences. One such area that is rife with the need for better algorithms is cancer biology—a field where relatively few insights are being generated from relatively large amounts of data. In part, this is due to the inability of mere statistics to reflect cancer as a genetic evolutionary process—one that involves cells actively mutating in order to navigate host barriers, outcompete neighboring cells, and expand spatially. Our work is built upon the central proposition that the Markov Decision Process (MDP) can better represent the process by which cancer arises and progresses. More specifically, by encoding a cancer cell{\textquoteright}s complex behavior as a MDP, we seek to model the series of genetic changes, or evolutionary trajectory, that leads to cancer as an optimal decision process. We posit that using an Inverse Reinforcement Learning (IRL) approach will enable us to reverse engineer an optimal policy and reward function based on a set of “expert demonstrations” extracted from the DNA of patient tumors. The inferred reward function and optimal policy can subsequently be used to extrapolate the evolutionary trajectory of any tumor. Here, we introduce a Bayesian nonparametric IRL model (PUR-IRL) where the number of reward functions is a priori unbounded in order to account for uncertainty in cancer data, i.e., the existence of latent trajectories and non-uniform sampling. We show that PUR-IRL is “unreasonably effective” in gaining interpretable and intuitive insights about cancer progression from high-dimensional genome data.",

author = "John Kalantari and Heidi Nelson and Nicholas Chia",

note = "Publisher Copyright: Copyright {\textcopyright} 2020, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.; 34th AAAI Conference on Artificial Intelligence, AAAI 2020 ; Conference date: 07-02-2020 Through 12-02-2020",

year = "2020",

language = "English (US)",

series = "AAAI 2020 - 34th AAAI Conference on Artificial Intelligence",

publisher = "AAAI press",

pages = "437--445",

booktitle = "AAAI 2020 - 34th AAAI Conference on Artificial Intelligence",

}

TY - GEN

T1 - The unreasonable effectiveness of inverse reinforcement learning in advancing cancer research

AU - Kalantari, John

AU - Nelson, Heidi

AU - Chia, Nicholas

PY - 2020

Y1 - 2020

N2 - The “No Free Lunch” theorem states that for any algorithm, elevated performance over one class of problems is offset by its performance over another. Stated differently, no algorithm works for everything. Instead, designing effective algorithms often means exploiting prior knowledge of data relationships specific to a given problem. This “unreasonable efficacy” is especially desirable for complex and seemingly intractable problems in the natural sciences. One such area that is rife with the need for better algorithms is cancer biology—a field where relatively few insights are being generated from relatively large amounts of data. In part, this is due to the inability of mere statistics to reflect cancer as a genetic evolutionary process—one that involves cells actively mutating in order to navigate host barriers, outcompete neighboring cells, and expand spatially. Our work is built upon the central proposition that the Markov Decision Process (MDP) can better represent the process by which cancer arises and progresses. More specifically, by encoding a cancer cell’s complex behavior as a MDP, we seek to model the series of genetic changes, or evolutionary trajectory, that leads to cancer as an optimal decision process. We posit that using an Inverse Reinforcement Learning (IRL) approach will enable us to reverse engineer an optimal policy and reward function based on a set of “expert demonstrations” extracted from the DNA of patient tumors. The inferred reward function and optimal policy can subsequently be used to extrapolate the evolutionary trajectory of any tumor. Here, we introduce a Bayesian nonparametric IRL model (PUR-IRL) where the number of reward functions is a priori unbounded in order to account for uncertainty in cancer data, i.e., the existence of latent trajectories and non-uniform sampling. We show that PUR-IRL is “unreasonably effective” in gaining interpretable and intuitive insights about cancer progression from high-dimensional genome data.

AB - The “No Free Lunch” theorem states that for any algorithm, elevated performance over one class of problems is offset by its performance over another. Stated differently, no algorithm works for everything. Instead, designing effective algorithms often means exploiting prior knowledge of data relationships specific to a given problem. This “unreasonable efficacy” is especially desirable for complex and seemingly intractable problems in the natural sciences. One such area that is rife with the need for better algorithms is cancer biology—a field where relatively few insights are being generated from relatively large amounts of data. In part, this is due to the inability of mere statistics to reflect cancer as a genetic evolutionary process—one that involves cells actively mutating in order to navigate host barriers, outcompete neighboring cells, and expand spatially. Our work is built upon the central proposition that the Markov Decision Process (MDP) can better represent the process by which cancer arises and progresses. More specifically, by encoding a cancer cell’s complex behavior as a MDP, we seek to model the series of genetic changes, or evolutionary trajectory, that leads to cancer as an optimal decision process. We posit that using an Inverse Reinforcement Learning (IRL) approach will enable us to reverse engineer an optimal policy and reward function based on a set of “expert demonstrations” extracted from the DNA of patient tumors. The inferred reward function and optimal policy can subsequently be used to extrapolate the evolutionary trajectory of any tumor. Here, we introduce a Bayesian nonparametric IRL model (PUR-IRL) where the number of reward functions is a priori unbounded in order to account for uncertainty in cancer data, i.e., the existence of latent trajectories and non-uniform sampling. We show that PUR-IRL is “unreasonably effective” in gaining interpretable and intuitive insights about cancer progression from high-dimensional genome data.

UR - http://www.scopus.com/inward/record.url?scp=85104666545&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85104666545&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85104666545

T3 - AAAI 2020 - 34th AAAI Conference on Artificial Intelligence

SP - 437

EP - 445

BT - AAAI 2020 - 34th AAAI Conference on Artificial Intelligence

PB - AAAI press

T2 - 34th AAAI Conference on Artificial Intelligence, AAAI 2020

Y2 - 7 February 2020 through 12 February 2020

ER -

The unreasonable effectiveness of inverse reinforcement learning in advancing cancer research

Abstract

Publication series

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this