TY - JOUR
T1 - Prediction of 30-day all-cause readmissions in patients hospitalized for heart failure
T2 - Comparison of machine learning and other statistical approaches
AU - Frizzell, Jarrod D.
AU - Liang, Li
AU - Schulte, Phillip J.
AU - Yancy, Clyde W.
AU - Heidenreich, Paul A.
AU - Hernandez, Adrian F.
AU - Bhatt, Deepak L.
AU - Fonarow, Gregg C.
AU - Laskey, Warren K.
N1 - Funding Information:
Funding/Support: Research was funded by the
Publisher Copyright:
Copyright 2017 American Medical Association. All rights reserved.
PY - 2017/2
Y1 - 2017/2
N2 - IMPORTANCE: Several attempts have been made at developing models to predict 30-day readmissions in patients with heart failure, but none have sufficient discriminatory capacity for clinical use. Machine-learning (ML) algorithms represent a novel approach and may have potential advantages over traditional statistical modeling. OBJECTIVE: To develop models using a ML approach to predict all-cause readmissions 30 days after discharge from a heart failure hospitalization and to compare ML model performance with models developed using “conventional” statistically based methods. DESIGN, SETTING, AND PARTICIPANTS: Models were developed using ML algorithms, specifically, a tree-augmented naive Bayesian network, a random forest algorithm, and a gradient-boosted model and compared with traditional statistical methods using 2 independently derived logistic regression models (a de novo model and an a priori model developed using electronic health records) and a least absolute shrinkage and selection operator method. The study sample was randomly divided into training (70%) and validation (30%) sets to develop and test model performance. This was a registry-based study, and the study sample was obtained by linking patients from the Get With the Guidelines Heart Failure registry with Medicare data. After applying appropriate inclusion and exclusion criteria, 56 477 patients were included in our analysis. The study was conducted between January 4, 2005, and December 1, 2010, and analysis of the data was conducted between November 25, 2014, and June 30, 2016. MAIN OUTCOMES AND MEASURES: C statistics were used for comparison of discriminatory capacity across models in the validation sample. RESULTS: The overall 30-day rehospitalization rate was 21.2% (11 959 of 56 477 patients). For the tree-augmented naive Bayesian network, random forest, gradient-boosted, logistic regression, and least absolute shrinkage and selection operator models, C statistics for the validation sets were similar: 0.618, 0.607, 0.614, 0.624, and 0.618, respectively. Applying the previously validated electronic health records model to our study sample yielded a C statistic of 0.589 for the validation set. CONCLUSIONS AND RELEVANCE: Use of a number of ML algorithms did not improve prediction of 30-day heart failure readmissions compared with more traditional prediction models. Although there will likely be further applications of ML approaches in prognostic modeling, our study fits within the literature of limited predictive ability for heart failure readmissions.
AB - IMPORTANCE: Several attempts have been made at developing models to predict 30-day readmissions in patients with heart failure, but none have sufficient discriminatory capacity for clinical use. Machine-learning (ML) algorithms represent a novel approach and may have potential advantages over traditional statistical modeling. OBJECTIVE: To develop models using a ML approach to predict all-cause readmissions 30 days after discharge from a heart failure hospitalization and to compare ML model performance with models developed using “conventional” statistically based methods. DESIGN, SETTING, AND PARTICIPANTS: Models were developed using ML algorithms, specifically, a tree-augmented naive Bayesian network, a random forest algorithm, and a gradient-boosted model and compared with traditional statistical methods using 2 independently derived logistic regression models (a de novo model and an a priori model developed using electronic health records) and a least absolute shrinkage and selection operator method. The study sample was randomly divided into training (70%) and validation (30%) sets to develop and test model performance. This was a registry-based study, and the study sample was obtained by linking patients from the Get With the Guidelines Heart Failure registry with Medicare data. After applying appropriate inclusion and exclusion criteria, 56 477 patients were included in our analysis. The study was conducted between January 4, 2005, and December 1, 2010, and analysis of the data was conducted between November 25, 2014, and June 30, 2016. MAIN OUTCOMES AND MEASURES: C statistics were used for comparison of discriminatory capacity across models in the validation sample. RESULTS: The overall 30-day rehospitalization rate was 21.2% (11 959 of 56 477 patients). For the tree-augmented naive Bayesian network, random forest, gradient-boosted, logistic regression, and least absolute shrinkage and selection operator models, C statistics for the validation sets were similar: 0.618, 0.607, 0.614, 0.624, and 0.618, respectively. Applying the previously validated electronic health records model to our study sample yielded a C statistic of 0.589 for the validation set. CONCLUSIONS AND RELEVANCE: Use of a number of ML algorithms did not improve prediction of 30-day heart failure readmissions compared with more traditional prediction models. Although there will likely be further applications of ML approaches in prognostic modeling, our study fits within the literature of limited predictive ability for heart failure readmissions.
UR - http://www.scopus.com/inward/record.url?scp=85017203403&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85017203403&partnerID=8YFLogxK
U2 - 10.1001/jamacardio.2016.3956
DO - 10.1001/jamacardio.2016.3956
M3 - Article
C2 - 27784047
AN - SCOPUS:85017203403
SN - 2380-6583
VL - 2
SP - 204
EP - 209
JO - JAMA cardiology
JF - JAMA cardiology
IS - 2
ER -