TY - GEN
T1 - The unreasonable effectiveness of inverse reinforcement learning in advancing cancer research
AU - Kalantari, John
AU - Nelson, Heidi
AU - Chia, Nicholas
N1 - Funding Information:
Acknowledgments. We would first like to thank the patients who volunteered for the original study. We also thank the many other individuals who made this work possible including members of the Mayo Clinic Microbiome Laboratory, study coordinators, students, colorectal surgeons, program directors, and pathology assistants. We also specially acknowledge the assistance of Gregory Dougherty during software development and Dr. Lisa Boardman for her clini- cal insight. We would like to thank the National Cancer Institute (R01CA179243), the DeWitt and Curtiss families, the Center for Individualized Medicine at Mayo Clinic for their generous support of this work. Last but not least, Mathieu Wiepert, Marie Kostecki, and Shamailah Lando for keeping the high-performance computing clusters running through our exhaustive calculations.
Publisher Copyright:
Copyright © 2020, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2020
Y1 - 2020
N2 - The “No Free Lunch” theorem states that for any algorithm, elevated performance over one class of problems is offset by its performance over another. Stated differently, no algorithm works for everything. Instead, designing effective algorithms often means exploiting prior knowledge of data relationships specific to a given problem. This “unreasonable efficacy” is especially desirable for complex and seemingly intractable problems in the natural sciences. One such area that is rife with the need for better algorithms is cancer biology—a field where relatively few insights are being generated from relatively large amounts of data. In part, this is due to the inability of mere statistics to reflect cancer as a genetic evolutionary process—one that involves cells actively mutating in order to navigate host barriers, outcompete neighboring cells, and expand spatially. Our work is built upon the central proposition that the Markov Decision Process (MDP) can better represent the process by which cancer arises and progresses. More specifically, by encoding a cancer cell’s complex behavior as a MDP, we seek to model the series of genetic changes, or evolutionary trajectory, that leads to cancer as an optimal decision process. We posit that using an Inverse Reinforcement Learning (IRL) approach will enable us to reverse engineer an optimal policy and reward function based on a set of “expert demonstrations” extracted from the DNA of patient tumors. The inferred reward function and optimal policy can subsequently be used to extrapolate the evolutionary trajectory of any tumor. Here, we introduce a Bayesian nonparametric IRL model (PUR-IRL) where the number of reward functions is a priori unbounded in order to account for uncertainty in cancer data, i.e., the existence of latent trajectories and non-uniform sampling. We show that PUR-IRL is “unreasonably effective” in gaining interpretable and intuitive insights about cancer progression from high-dimensional genome data.
AB - The “No Free Lunch” theorem states that for any algorithm, elevated performance over one class of problems is offset by its performance over another. Stated differently, no algorithm works for everything. Instead, designing effective algorithms often means exploiting prior knowledge of data relationships specific to a given problem. This “unreasonable efficacy” is especially desirable for complex and seemingly intractable problems in the natural sciences. One such area that is rife with the need for better algorithms is cancer biology—a field where relatively few insights are being generated from relatively large amounts of data. In part, this is due to the inability of mere statistics to reflect cancer as a genetic evolutionary process—one that involves cells actively mutating in order to navigate host barriers, outcompete neighboring cells, and expand spatially. Our work is built upon the central proposition that the Markov Decision Process (MDP) can better represent the process by which cancer arises and progresses. More specifically, by encoding a cancer cell’s complex behavior as a MDP, we seek to model the series of genetic changes, or evolutionary trajectory, that leads to cancer as an optimal decision process. We posit that using an Inverse Reinforcement Learning (IRL) approach will enable us to reverse engineer an optimal policy and reward function based on a set of “expert demonstrations” extracted from the DNA of patient tumors. The inferred reward function and optimal policy can subsequently be used to extrapolate the evolutionary trajectory of any tumor. Here, we introduce a Bayesian nonparametric IRL model (PUR-IRL) where the number of reward functions is a priori unbounded in order to account for uncertainty in cancer data, i.e., the existence of latent trajectories and non-uniform sampling. We show that PUR-IRL is “unreasonably effective” in gaining interpretable and intuitive insights about cancer progression from high-dimensional genome data.
UR - http://www.scopus.com/inward/record.url?scp=85104666545&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85104666545&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85104666545
T3 - AAAI 2020 - 34th AAAI Conference on Artificial Intelligence
SP - 437
EP - 445
BT - AAAI 2020 - 34th AAAI Conference on Artificial Intelligence
PB - AAAI press
T2 - 34th AAAI Conference on Artificial Intelligence, AAAI 2020
Y2 - 7 February 2020 through 12 February 2020
ER -