A Computational Method for Learning Disease Trajectories from Partially Observable EHR Data

Wonsuk Oh, Michael S. Steinbach, M. Regina Castro, Kevin A. Peterson, Vipin Kumar, Pedro J. Caraballo, Gyorgy J. Simon

Research output: Contribution to journalArticlepeer-review


Diseases can show different courses of progression even when patients share the same risk factors. Recent studies have revealed that the use of trajectories, the order in which diseases manifest throughout life, can be predictive of the course of progression. In this study, we propose a novel computational method for learning disease trajectories from EHR data. The proposed method consists of three parts: first, we propose an algorithm for extracting trajectories from EHR data; second, three criteria for filtering trajectories; and third, a likelihood function for assessing the risk of developing a set of outcomes given a trajectory set. We applied our methods to extract a set of disease trajectories from Mayo Clinic EHR data and evaluated it internally based on log-likelihood, which can be interpreted as the trajectories' ability to explain the observed (partial) disease progressions. We then externally evaluated the trajectories on EHR data from an independent health system, M Health Fairview. The proposed algorithm extracted a comprehensive set of disease trajectories that can explain the observed outcomes substantially better than competing methods and the proposed filtering criteria selected a small subset of disease trajectories that are highly interpretable and suffered only a minimal (relative 5%) loss of the ability to explain disease progression in both the internal and external validation.

Original languageEnglish (US)
Article number9456038
Pages (from-to)2476-2486
Number of pages11
JournalIEEE Journal of Biomedical and Health Informatics
Issue number7
StatePublished - Jul 2021


  • Disease trajectories
  • electronic health records
  • machine learning

ASJC Scopus subject areas

  • Biotechnology
  • Computer Science Applications
  • Electrical and Electronic Engineering
  • Health Information Management


Dive into the research topics of 'A Computational Method for Learning Disease Trajectories from Partially Observable EHR Data'. Together they form a unique fingerprint.

Cite this