Comparative Effectiveness of Machine Learning Approaches for Predicting Gastrointestinal Bleeds in Patients Receiving Antithrombotic Treatment

Jeph Herrin; Neena S. Abraham; Xiaoxi Yao; Peter A. Noseworthy; Jonathan Inselman; Nilay D. Shah; Che Ngufor

doi:10.1001/jamanetworkopen.2021.10703

Comparative Effectiveness of Machine Learning Approaches for Predicting Gastrointestinal Bleeds in Patients Receiving Antithrombotic Treatment

Jeph Herrin, Neena S. Abraham, Xiaoxi Yao, Peter A. Noseworthy, Jonathan Inselman, Nilay D. Shah, Che Ngufor

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

Importance: Anticipating the risk of gastrointestinal bleeding (GIB) when initiating antithrombotic treatment (oral antiplatelets or anticoagulants) is limited by existing risk prediction models. Machine learning algorithms may result in superior predictive models to aid in clinical decision-making. Objective: To compare the performance of 3 machine learning approaches with the commonly used HAS-BLED (hypertension, abnormal kidney and liver function, stroke, bleeding, labile international normalized ratio, older age, and drug or alcohol use) risk score in predicting antithrombotic-related GIB. Design, Setting, and Participants: This retrospective cross-sectional study used data from the OptumLabs Data Warehouse, which contains medical and pharmacy claims on privately insured patients and Medicare Advantage enrollees in the US. The study cohort included patients 18 years or older with a history of atrial fibrillation, ischemic heart disease, or venous thromboembolism who were prescribed oral anticoagulant and/or thienopyridine antiplatelet agents between January 1, 2016, and December 31, 2019. Exposures: A cohort of patients prescribed oral anticoagulant and thienopyridine antiplatelet agents was divided into development and validation cohorts based on date of index prescription. The development cohort was used to train 3 machine learning models to predict GIB at 6 and 12 months: regularized Cox proportional hazards regression (RegCox), random survival forests (RSF), and extreme gradient boosting (XGBoost). Main Outcomes and Measures: The performance of the models for predicting GIB in the validation cohort, evaluated using the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value, and prediction density plots. Relative importance scores were used to identify the variables that were most influential in the top-performing machine learning model. Results: In the entire study cohort of 306463 patients, 166177 (54.2%) were male, 193648 (63.2%) were White, the mean (SD) age was 69.0 (12.6) years, and 12322 (4.0%) had experienced a GIB. In the validation data set, the HAS-BLED model had an AUC of 0.60 for predicting GIB at 6 months and 0.59 at 12 months. The RegCox model performed the best in the validation set, with an AUC of 0.67 at 6 months and 0.66 at 12 months. XGBoost was similar, with AUCs of 0.67 at 6 months and 0.66 at 12 months, whereas for RSF, AUCs were 0.62 at 6 months and 0.60 at 12 months. The variables with the highest importance scores in the RegCox model were prior GI bleed (importance score, 0.72); atrial fibrillation, ischemic heart disease, and venous thromboembolism combined (importance score, 0.38); and use of gastroprotective agents (importance score, 0.32). Conclusions and Relevance: In this cross-sectional study, the machine learning models examined showed similar performance in identifying patients at high risk for GIB after being prescribed antithrombotic agents. Two models (RegCox and XGBoost) performed modestly better than the HAS-BLED score. A prospective evaluation of the RegCox model compared with HAS-BLED may provide a better understanding of the clinical impact of improved performance.

Original language	English (US)
Article number	10703
Journal	JAMA Network Open
Volume	4
Issue number	5
DOIs	https://doi.org/10.1001/jamanetworkopen.2021.10703
State	Published - May 21 2021

ASJC Scopus subject areas

General Medicine

Access to Document

10.1001/jamanetworkopen.2021.10703

Cite this

@article{b070bc21a5e241f59d495228abdd151f,

title = "Comparative Effectiveness of Machine Learning Approaches for Predicting Gastrointestinal Bleeds in Patients Receiving Antithrombotic Treatment",

abstract = "Importance: Anticipating the risk of gastrointestinal bleeding (GIB) when initiating antithrombotic treatment (oral antiplatelets or anticoagulants) is limited by existing risk prediction models. Machine learning algorithms may result in superior predictive models to aid in clinical decision-making. Objective: To compare the performance of 3 machine learning approaches with the commonly used HAS-BLED (hypertension, abnormal kidney and liver function, stroke, bleeding, labile international normalized ratio, older age, and drug or alcohol use) risk score in predicting antithrombotic-related GIB. Design, Setting, and Participants: This retrospective cross-sectional study used data from the OptumLabs Data Warehouse, which contains medical and pharmacy claims on privately insured patients and Medicare Advantage enrollees in the US. The study cohort included patients 18 years or older with a history of atrial fibrillation, ischemic heart disease, or venous thromboembolism who were prescribed oral anticoagulant and/or thienopyridine antiplatelet agents between January 1, 2016, and December 31, 2019. Exposures: A cohort of patients prescribed oral anticoagulant and thienopyridine antiplatelet agents was divided into development and validation cohorts based on date of index prescription. The development cohort was used to train 3 machine learning models to predict GIB at 6 and 12 months: regularized Cox proportional hazards regression (RegCox), random survival forests (RSF), and extreme gradient boosting (XGBoost). Main Outcomes and Measures: The performance of the models for predicting GIB in the validation cohort, evaluated using the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value, and prediction density plots. Relative importance scores were used to identify the variables that were most influential in the top-performing machine learning model. Results: In the entire study cohort of 306463 patients, 166177 (54.2%) were male, 193648 (63.2%) were White, the mean (SD) age was 69.0 (12.6) years, and 12322 (4.0%) had experienced a GIB. In the validation data set, the HAS-BLED model had an AUC of 0.60 for predicting GIB at 6 months and 0.59 at 12 months. The RegCox model performed the best in the validation set, with an AUC of 0.67 at 6 months and 0.66 at 12 months. XGBoost was similar, with AUCs of 0.67 at 6 months and 0.66 at 12 months, whereas for RSF, AUCs were 0.62 at 6 months and 0.60 at 12 months. The variables with the highest importance scores in the RegCox model were prior GI bleed (importance score, 0.72); atrial fibrillation, ischemic heart disease, and venous thromboembolism combined (importance score, 0.38); and use of gastroprotective agents (importance score, 0.32). Conclusions and Relevance: In this cross-sectional study, the machine learning models examined showed similar performance in identifying patients at high risk for GIB after being prescribed antithrombotic agents. Two models (RegCox and XGBoost) performed modestly better than the HAS-BLED score. A prospective evaluation of the RegCox model compared with HAS-BLED may provide a better understanding of the clinical impact of improved performance.",

author = "Jeph Herrin and Abraham, {Neena S.} and Xiaoxi Yao and Noseworthy, {Peter A.} and Jonathan Inselman and Shah, {Nilay D.} and Che Ngufor",

year = "2021",

month = may,

day = "21",

doi = "10.1001/jamanetworkopen.2021.10703",

language = "English (US)",

volume = "4",

journal = "JAMA Network Open",

issn = "2574-3805",

publisher = "American Medical Association",

number = "5",

}

TY - JOUR

T1 - Comparative Effectiveness of Machine Learning Approaches for Predicting Gastrointestinal Bleeds in Patients Receiving Antithrombotic Treatment

AU - Herrin, Jeph

AU - Abraham, Neena S.

AU - Yao, Xiaoxi

AU - Noseworthy, Peter A.

AU - Inselman, Jonathan

AU - Shah, Nilay D.

AU - Ngufor, Che

PY - 2021/5/21

Y1 - 2021/5/21

N2 - Importance: Anticipating the risk of gastrointestinal bleeding (GIB) when initiating antithrombotic treatment (oral antiplatelets or anticoagulants) is limited by existing risk prediction models. Machine learning algorithms may result in superior predictive models to aid in clinical decision-making. Objective: To compare the performance of 3 machine learning approaches with the commonly used HAS-BLED (hypertension, abnormal kidney and liver function, stroke, bleeding, labile international normalized ratio, older age, and drug or alcohol use) risk score in predicting antithrombotic-related GIB. Design, Setting, and Participants: This retrospective cross-sectional study used data from the OptumLabs Data Warehouse, which contains medical and pharmacy claims on privately insured patients and Medicare Advantage enrollees in the US. The study cohort included patients 18 years or older with a history of atrial fibrillation, ischemic heart disease, or venous thromboembolism who were prescribed oral anticoagulant and/or thienopyridine antiplatelet agents between January 1, 2016, and December 31, 2019. Exposures: A cohort of patients prescribed oral anticoagulant and thienopyridine antiplatelet agents was divided into development and validation cohorts based on date of index prescription. The development cohort was used to train 3 machine learning models to predict GIB at 6 and 12 months: regularized Cox proportional hazards regression (RegCox), random survival forests (RSF), and extreme gradient boosting (XGBoost). Main Outcomes and Measures: The performance of the models for predicting GIB in the validation cohort, evaluated using the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value, and prediction density plots. Relative importance scores were used to identify the variables that were most influential in the top-performing machine learning model. Results: In the entire study cohort of 306463 patients, 166177 (54.2%) were male, 193648 (63.2%) were White, the mean (SD) age was 69.0 (12.6) years, and 12322 (4.0%) had experienced a GIB. In the validation data set, the HAS-BLED model had an AUC of 0.60 for predicting GIB at 6 months and 0.59 at 12 months. The RegCox model performed the best in the validation set, with an AUC of 0.67 at 6 months and 0.66 at 12 months. XGBoost was similar, with AUCs of 0.67 at 6 months and 0.66 at 12 months, whereas for RSF, AUCs were 0.62 at 6 months and 0.60 at 12 months. The variables with the highest importance scores in the RegCox model were prior GI bleed (importance score, 0.72); atrial fibrillation, ischemic heart disease, and venous thromboembolism combined (importance score, 0.38); and use of gastroprotective agents (importance score, 0.32). Conclusions and Relevance: In this cross-sectional study, the machine learning models examined showed similar performance in identifying patients at high risk for GIB after being prescribed antithrombotic agents. Two models (RegCox and XGBoost) performed modestly better than the HAS-BLED score. A prospective evaluation of the RegCox model compared with HAS-BLED may provide a better understanding of the clinical impact of improved performance.

AB - Importance: Anticipating the risk of gastrointestinal bleeding (GIB) when initiating antithrombotic treatment (oral antiplatelets or anticoagulants) is limited by existing risk prediction models. Machine learning algorithms may result in superior predictive models to aid in clinical decision-making. Objective: To compare the performance of 3 machine learning approaches with the commonly used HAS-BLED (hypertension, abnormal kidney and liver function, stroke, bleeding, labile international normalized ratio, older age, and drug or alcohol use) risk score in predicting antithrombotic-related GIB. Design, Setting, and Participants: This retrospective cross-sectional study used data from the OptumLabs Data Warehouse, which contains medical and pharmacy claims on privately insured patients and Medicare Advantage enrollees in the US. The study cohort included patients 18 years or older with a history of atrial fibrillation, ischemic heart disease, or venous thromboembolism who were prescribed oral anticoagulant and/or thienopyridine antiplatelet agents between January 1, 2016, and December 31, 2019. Exposures: A cohort of patients prescribed oral anticoagulant and thienopyridine antiplatelet agents was divided into development and validation cohorts based on date of index prescription. The development cohort was used to train 3 machine learning models to predict GIB at 6 and 12 months: regularized Cox proportional hazards regression (RegCox), random survival forests (RSF), and extreme gradient boosting (XGBoost). Main Outcomes and Measures: The performance of the models for predicting GIB in the validation cohort, evaluated using the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value, and prediction density plots. Relative importance scores were used to identify the variables that were most influential in the top-performing machine learning model. Results: In the entire study cohort of 306463 patients, 166177 (54.2%) were male, 193648 (63.2%) were White, the mean (SD) age was 69.0 (12.6) years, and 12322 (4.0%) had experienced a GIB. In the validation data set, the HAS-BLED model had an AUC of 0.60 for predicting GIB at 6 months and 0.59 at 12 months. The RegCox model performed the best in the validation set, with an AUC of 0.67 at 6 months and 0.66 at 12 months. XGBoost was similar, with AUCs of 0.67 at 6 months and 0.66 at 12 months, whereas for RSF, AUCs were 0.62 at 6 months and 0.60 at 12 months. The variables with the highest importance scores in the RegCox model were prior GI bleed (importance score, 0.72); atrial fibrillation, ischemic heart disease, and venous thromboembolism combined (importance score, 0.38); and use of gastroprotective agents (importance score, 0.32). Conclusions and Relevance: In this cross-sectional study, the machine learning models examined showed similar performance in identifying patients at high risk for GIB after being prescribed antithrombotic agents. Two models (RegCox and XGBoost) performed modestly better than the HAS-BLED score. A prospective evaluation of the RegCox model compared with HAS-BLED may provide a better understanding of the clinical impact of improved performance.

UR - http://www.scopus.com/inward/record.url?scp=85107008021&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85107008021&partnerID=8YFLogxK

U2 - 10.1001/jamanetworkopen.2021.10703

DO - 10.1001/jamanetworkopen.2021.10703

M3 - Article

C2 - 34019087

AN - SCOPUS:85107008021

SN - 2574-3805

VL - 4

JO - JAMA Network Open

JF - JAMA Network Open

IS - 5

M1 - 10703

ER -

Comparative Effectiveness of Machine Learning Approaches for Predicting Gastrointestinal Bleeds in Patients Receiving Antithrombotic Treatment

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this