TY - GEN
T1 - Explanatory Analysis of a Machine Learning Model to Identify Hypertrophic Cardiomyopathy Patients from EHR Using Diagnostic Codes
AU - Farahani, Nasibeh Zanjirani
AU - Sundaram, Divaakar Siva Baala
AU - Enayati, Moein
AU - Arunachalam, Shivaram Poigai
AU - Pasupathy, Kalyan
AU - Arruda-Olson, Adelaide M.
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/12/16
Y1 - 2020/12/16
N2 - Hypertrophic cardiomyopathy (HCM) is a genetic heart disease that is the leading cause of sudden cardiac death (SCD) in young adults. Despite the well-known risk factors and existing clinical practice guidelines, HCM patients are underdiagnosed and sub-optimally managed. Developing machine learning models on electronic health record (EHR) data can help in better diagnosis of HCM and thus improve hundreds of patient lives. Automated phenotyping using HCM billing codes has received limited attention in the literature with a small number of prior publications. In this paper, we propose a novel predictive model that helps physicians in making diagnostic decisions, by means of information learned from historical data of similar patients.We assembled a cohort of 11,562 patients with known or suspected HCM who have visited Mayo Clinic between the years 1995 to 2019. All existing billing codes of these patients were extracted from the EHR data warehouse. Target ground truth labeling for training the machine learning model was provided by confirmed HCM diagnosis using the gold standard imaging tests for HCM diagnosis echocardiography (echo), or cardiac magnetic resonance (CMR) imaging. As the result, patients were labeled into three categories of 'yes' definite HCM 'no' HCM phenotype and 'possible HCM' after a manual review of medical records and imaging tests.In this study, a random forest was adopted to investigate the predictive performance of billing codes for the identification of HCM patients due to its practical application and expected accuracy in a wide range of use cases. Our model performed well in finding patients with 'yes definite 'possible' and 'no' HCM with an accuracy of 71%, weighted recall of 70%, the precision of 75%, and weighted F1 score of 72%. Furthermore, we provided visualizations based on multidimensional scaling and the principal component analysis to provide insights for clinicians' interpretation. This model can be used for the identification of HCM patients using their EHR data, and help clinicians in their diagnosis decision making.
AB - Hypertrophic cardiomyopathy (HCM) is a genetic heart disease that is the leading cause of sudden cardiac death (SCD) in young adults. Despite the well-known risk factors and existing clinical practice guidelines, HCM patients are underdiagnosed and sub-optimally managed. Developing machine learning models on electronic health record (EHR) data can help in better diagnosis of HCM and thus improve hundreds of patient lives. Automated phenotyping using HCM billing codes has received limited attention in the literature with a small number of prior publications. In this paper, we propose a novel predictive model that helps physicians in making diagnostic decisions, by means of information learned from historical data of similar patients.We assembled a cohort of 11,562 patients with known or suspected HCM who have visited Mayo Clinic between the years 1995 to 2019. All existing billing codes of these patients were extracted from the EHR data warehouse. Target ground truth labeling for training the machine learning model was provided by confirmed HCM diagnosis using the gold standard imaging tests for HCM diagnosis echocardiography (echo), or cardiac magnetic resonance (CMR) imaging. As the result, patients were labeled into three categories of 'yes' definite HCM 'no' HCM phenotype and 'possible HCM' after a manual review of medical records and imaging tests.In this study, a random forest was adopted to investigate the predictive performance of billing codes for the identification of HCM patients due to its practical application and expected accuracy in a wide range of use cases. Our model performed well in finding patients with 'yes definite 'possible' and 'no' HCM with an accuracy of 71%, weighted recall of 70%, the precision of 75%, and weighted F1 score of 72%. Furthermore, we provided visualizations based on multidimensional scaling and the principal component analysis to provide insights for clinicians' interpretation. This model can be used for the identification of HCM patients using their EHR data, and help clinicians in their diagnosis decision making.
KW - billing code
KW - classification
KW - decision making
KW - diagnostic codes
KW - electronic health records (EHR)
KW - hypertrophic cardiomyopathy (HCM)
KW - machine learning
KW - random forest
UR - http://www.scopus.com/inward/record.url?scp=85100348544&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85100348544&partnerID=8YFLogxK
U2 - 10.1109/BIBM49941.2020.9313231
DO - 10.1109/BIBM49941.2020.9313231
M3 - Conference contribution
AN - SCOPUS:85100348544
T3 - Proceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020
SP - 1932
EP - 1937
BT - Proceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020
A2 - Park, Taesung
A2 - Cho, Young-Rae
A2 - Hu, Xiaohua Tony
A2 - Yoo, Illhoi
A2 - Woo, Hyun Goo
A2 - Wang, Jianxin
A2 - Facelli, Julio
A2 - Nam, Seungyoon
A2 - Kang, Mingon
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020
Y2 - 16 December 2020 through 19 December 2020
ER -