A novel method for causal structure discovery from EHR data and its application to type-2 diabetes mellitus

Xinpeng Shen, Sisi Ma, Prashanthi Vemuri, M. Regina Castro, Pedro J. Caraballo, Gyorgy J. Simon

Research output: Contribution to journalArticlepeer-review


Modern AI-based clinical decision support models owe their success in part to the very large number of predictors they use. Safe and robust decision support, especially for intervention planning, requires causal, not associative, relationships. Traditional methods of causal discovery, clinical trials and extracting biochemical pathways, are resource intensive and may not scale up to the number and complexity of relationships sufficient for precision treatment planning. Computational causal structure discovery (CSD) from electronic health records (EHR) data can represent a solution, however, current CSD methods fall short on EHR data. This paper presents a CSD method tailored to the EHR data. The application of the proposed methodology was demonstrated on type-2 diabetes mellitus. A large EHR dataset from Mayo Clinic was used as development cohort, and another large dataset from an independent health system, M Health Fairview, as external validation cohort. The proposed method achieved very high recall (.95) and substantially higher precision than the general-purpose methods (.84 versus.29, and.55). The causal relationships extracted from the development and external validation cohorts had a high (81%) overlap. Due to the adaptations to EHR data, the proposed method is more suitable for use in clinical decision support than the general-purpose methods.

Original languageEnglish (US)
Article number21025
JournalScientific reports
Issue number1
StatePublished - Dec 2021

ASJC Scopus subject areas

  • General


Dive into the research topics of 'A novel method for causal structure discovery from EHR data and its application to type-2 diabetes mellitus'. Together they form a unique fingerprint.

Cite this