Comparative Effectiveness of Machine Learning Approaches for Predicting Gastrointestinal Bleeds in Patients Receiving Antithrombotic Treatment

Jeph Herrin, Neena S. Abraham, Xiaoxi Yao, Peter A. Noseworthy, Jonathan Inselman, Nilay D. Shah, Che Ngufor

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


Importance: Anticipating the risk of gastrointestinal bleeding (GIB) when initiating antithrombotic treatment (oral antiplatelets or anticoagulants) is limited by existing risk prediction models. Machine learning algorithms may result in superior predictive models to aid in clinical decision-making. Objective: To compare the performance of 3 machine learning approaches with the commonly used HAS-BLED (hypertension, abnormal kidney and liver function, stroke, bleeding, labile international normalized ratio, older age, and drug or alcohol use) risk score in predicting antithrombotic-related GIB. Design, Setting, and Participants: This retrospective cross-sectional study used data from the OptumLabs Data Warehouse, which contains medical and pharmacy claims on privately insured patients and Medicare Advantage enrollees in the US. The study cohort included patients 18 years or older with a history of atrial fibrillation, ischemic heart disease, or venous thromboembolism who were prescribed oral anticoagulant and/or thienopyridine antiplatelet agents between January 1, 2016, and December 31, 2019. Exposures: A cohort of patients prescribed oral anticoagulant and thienopyridine antiplatelet agents was divided into development and validation cohorts based on date of index prescription. The development cohort was used to train 3 machine learning models to predict GIB at 6 and 12 months: regularized Cox proportional hazards regression (RegCox), random survival forests (RSF), and extreme gradient boosting (XGBoost). Main Outcomes and Measures: The performance of the models for predicting GIB in the validation cohort, evaluated using the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value, and prediction density plots. Relative importance scores were used to identify the variables that were most influential in the top-performing machine learning model. Results: In the entire study cohort of 306463 patients, 166177 (54.2%) were male, 193648 (63.2%) were White, the mean (SD) age was 69.0 (12.6) years, and 12322 (4.0%) had experienced a GIB. In the validation data set, the HAS-BLED model had an AUC of 0.60 for predicting GIB at 6 months and 0.59 at 12 months. The RegCox model performed the best in the validation set, with an AUC of 0.67 at 6 months and 0.66 at 12 months. XGBoost was similar, with AUCs of 0.67 at 6 months and 0.66 at 12 months, whereas for RSF, AUCs were 0.62 at 6 months and 0.60 at 12 months. The variables with the highest importance scores in the RegCox model were prior GI bleed (importance score, 0.72); atrial fibrillation, ischemic heart disease, and venous thromboembolism combined (importance score, 0.38); and use of gastroprotective agents (importance score, 0.32). Conclusions and Relevance: In this cross-sectional study, the machine learning models examined showed similar performance in identifying patients at high risk for GIB after being prescribed antithrombotic agents. Two models (RegCox and XGBoost) performed modestly better than the HAS-BLED score. A prospective evaluation of the RegCox model compared with HAS-BLED may provide a better understanding of the clinical impact of improved performance.

Original languageEnglish (US)
Article number10703
JournalJAMA Network Open
Issue number5
StatePublished - May 21 2021

ASJC Scopus subject areas

  • General Medicine


Dive into the research topics of 'Comparative Effectiveness of Machine Learning Approaches for Predicting Gastrointestinal Bleeds in Patients Receiving Antithrombotic Treatment'. Together they form a unique fingerprint.

Cite this