Explanatory Analysis of a Machine Learning Model to Identify Hypertrophic Cardiomyopathy Patients from EHR Using Diagnostic Codes

Nasibeh Zanjirani Farahani; Divaakar Siva Baala Sundaram; Moein Enayati; Shivaram Poigai Arunachalam; Kalyan Pasupathy; Adelaide M. Arruda-Olson

doi:10.1109/BIBM49941.2020.9313231

Explanatory Analysis of a Machine Learning Model to Identify Hypertrophic Cardiomyopathy Patients from EHR Using Diagnostic Codes

Nasibeh Zanjirani Farahani, Divaakar Siva Baala Sundaram, Moein Enayati, Shivaram Poigai Arunachalam, Kalyan Pasupathy, Adelaide M. Arruda-Olson

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

1 Scopus citations

Abstract

Hypertrophic cardiomyopathy (HCM) is a genetic heart disease that is the leading cause of sudden cardiac death (SCD) in young adults. Despite the well-known risk factors and existing clinical practice guidelines, HCM patients are underdiagnosed and sub-optimally managed. Developing machine learning models on electronic health record (EHR) data can help in better diagnosis of HCM and thus improve hundreds of patient lives. Automated phenotyping using HCM billing codes has received limited attention in the literature with a small number of prior publications. In this paper, we propose a novel predictive model that helps physicians in making diagnostic decisions, by means of information learned from historical data of similar patients.We assembled a cohort of 11,562 patients with known or suspected HCM who have visited Mayo Clinic between the years 1995 to 2019. All existing billing codes of these patients were extracted from the EHR data warehouse. Target ground truth labeling for training the machine learning model was provided by confirmed HCM diagnosis using the gold standard imaging tests for HCM diagnosis echocardiography (echo), or cardiac magnetic resonance (CMR) imaging. As the result, patients were labeled into three categories of 'yes' definite HCM 'no' HCM phenotype and 'possible HCM' after a manual review of medical records and imaging tests.In this study, a random forest was adopted to investigate the predictive performance of billing codes for the identification of HCM patients due to its practical application and expected accuracy in a wide range of use cases. Our model performed well in finding patients with 'yes definite 'possible' and 'no' HCM with an accuracy of 71%, weighted recall of 70%, the precision of 75%, and weighted F1 score of 72%. Furthermore, we provided visualizations based on multidimensional scaling and the principal component analysis to provide insights for clinicians' interpretation. This model can be used for the identification of HCM patients using their EHR data, and help clinicians in their diagnosis decision making.

Original language	English (US)
Title of host publication	Proceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020
Editors	Taesung Park, Young-Rae Cho, Xiaohua Tony Hu, Illhoi Yoo, Hyun Goo Woo, Jianxin Wang, Julio Facelli, Seungyoon Nam, Mingon Kang
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	1932-1937
Number of pages	6
ISBN (Electronic)	9781728162157
DOIs	https://doi.org/10.1109/BIBM49941.2020.9313231
State	Published - Dec 16 2020
Event	2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020 - Virtual, Seoul, Korea, Republic of Duration: Dec 16 2020 → Dec 19 2020

Publication series

Name	Proceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020

Conference

Conference	2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020
Country/Territory	Korea, Republic of
City	Virtual, Seoul
Period	12/16/20 → 12/19/20

Keywords

billing code
classification
decision making
diagnostic codes
electronic health records (EHR)
hypertrophic cardiomyopathy (HCM)
machine learning
random forest

ASJC Scopus subject areas

Computer Science Applications
Information Systems and Management
Medicine (miscellaneous)
Health Informatics

Access to Document

10.1109/BIBM49941.2020.9313231

Cite this

Farahani, N. Z., Sundaram, D. S. B., Enayati, M., Arunachalam, S. P., Pasupathy, K., & Arruda-Olson, A. M. (2020). Explanatory Analysis of a Machine Learning Model to Identify Hypertrophic Cardiomyopathy Patients from EHR Using Diagnostic Codes. In T. Park, Y.-R. Cho, X. T. Hu, I. Yoo, H. G. Woo, J. Wang, J. Facelli, S. Nam, & M. Kang (Eds.), Proceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020 (pp. 1932-1937). Article 9313231 (Proceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BIBM49941.2020.9313231

Explanatory Analysis of a Machine Learning Model to Identify Hypertrophic Cardiomyopathy Patients from EHR Using Diagnostic Codes. / Farahani, Nasibeh Zanjirani; Sundaram, Divaakar Siva Baala; Enayati, Moein et al.
Proceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020. ed. / Taesung Park; Young-Rae Cho; Xiaohua Tony Hu; Illhoi Yoo; Hyun Goo Woo; Jianxin Wang; Julio Facelli; Seungyoon Nam; Mingon Kang. Institute of Electrical and Electronics Engineers Inc., 2020. p. 1932-1937 9313231 (Proceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Farahani, NZ, Sundaram, DSB, Enayati, M, Arunachalam, SP, Pasupathy, K & Arruda-Olson, AM 2020, Explanatory Analysis of a Machine Learning Model to Identify Hypertrophic Cardiomyopathy Patients from EHR Using Diagnostic Codes. in T Park, Y-R Cho, XT Hu, I Yoo, HG Woo, J Wang, J Facelli, S Nam & M Kang (eds), Proceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020., 9313231, Proceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020, Institute of Electrical and Electronics Engineers Inc., pp. 1932-1937, 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020, Virtual, Seoul, Korea, Republic of, 12/16/20. https://doi.org/10.1109/BIBM49941.2020.9313231

Farahani NZ, Sundaram DSB, Enayati M, Arunachalam SP, Pasupathy K, Arruda-Olson AM. Explanatory Analysis of a Machine Learning Model to Identify Hypertrophic Cardiomyopathy Patients from EHR Using Diagnostic Codes. In Park T, Cho YR, Hu XT, Yoo I, Woo HG, Wang J, Facelli J, Nam S, Kang M, editors, Proceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020. Institute of Electrical and Electronics Engineers Inc. 2020. p. 1932-1937. 9313231. (Proceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020). doi: 10.1109/BIBM49941.2020.9313231

Farahani, Nasibeh Zanjirani ; Sundaram, Divaakar Siva Baala ; Enayati, Moein et al. / Explanatory Analysis of a Machine Learning Model to Identify Hypertrophic Cardiomyopathy Patients from EHR Using Diagnostic Codes. Proceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020. editor / Taesung Park ; Young-Rae Cho ; Xiaohua Tony Hu ; Illhoi Yoo ; Hyun Goo Woo ; Jianxin Wang ; Julio Facelli ; Seungyoon Nam ; Mingon Kang. Institute of Electrical and Electronics Engineers Inc., 2020. pp. 1932-1937 (Proceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020).

@inproceedings{ff812b0f79fb43ad9cec0bccbf01f152,

title = "Explanatory Analysis of a Machine Learning Model to Identify Hypertrophic Cardiomyopathy Patients from EHR Using Diagnostic Codes",

abstract = "Hypertrophic cardiomyopathy (HCM) is a genetic heart disease that is the leading cause of sudden cardiac death (SCD) in young adults. Despite the well-known risk factors and existing clinical practice guidelines, HCM patients are underdiagnosed and sub-optimally managed. Developing machine learning models on electronic health record (EHR) data can help in better diagnosis of HCM and thus improve hundreds of patient lives. Automated phenotyping using HCM billing codes has received limited attention in the literature with a small number of prior publications. In this paper, we propose a novel predictive model that helps physicians in making diagnostic decisions, by means of information learned from historical data of similar patients.We assembled a cohort of 11,562 patients with known or suspected HCM who have visited Mayo Clinic between the years 1995 to 2019. All existing billing codes of these patients were extracted from the EHR data warehouse. Target ground truth labeling for training the machine learning model was provided by confirmed HCM diagnosis using the gold standard imaging tests for HCM diagnosis echocardiography (echo), or cardiac magnetic resonance (CMR) imaging. As the result, patients were labeled into three categories of 'yes' definite HCM 'no' HCM phenotype and 'possible HCM' after a manual review of medical records and imaging tests.In this study, a random forest was adopted to investigate the predictive performance of billing codes for the identification of HCM patients due to its practical application and expected accuracy in a wide range of use cases. Our model performed well in finding patients with 'yes definite 'possible' and 'no' HCM with an accuracy of 71%, weighted recall of 70%, the precision of 75%, and weighted F1 score of 72%. Furthermore, we provided visualizations based on multidimensional scaling and the principal component analysis to provide insights for clinicians' interpretation. This model can be used for the identification of HCM patients using their EHR data, and help clinicians in their diagnosis decision making.",

keywords = "billing code, classification, decision making, diagnostic codes, electronic health records (EHR), hypertrophic cardiomyopathy (HCM), machine learning, random forest",

author = "Farahani, {Nasibeh Zanjirani} and Sundaram, {Divaakar Siva Baala} and Moein Enayati and Arunachalam, {Shivaram Poigai} and Kalyan Pasupathy and Arruda-Olson, {Adelaide M.}",

note = "Publisher Copyright: {\textcopyright} 2020 IEEE.; 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020 ; Conference date: 16-12-2020 Through 19-12-2020",

year = "2020",

month = dec,

day = "16",

doi = "10.1109/BIBM49941.2020.9313231",

language = "English (US)",

series = "Proceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "1932--1937",

editor = "Taesung Park and Young-Rae Cho and Hu, {Xiaohua Tony} and Illhoi Yoo and Woo, {Hyun Goo} and Jianxin Wang and Julio Facelli and Seungyoon Nam and Mingon Kang",

booktitle = "Proceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020",

}

TY - GEN

T1 - Explanatory Analysis of a Machine Learning Model to Identify Hypertrophic Cardiomyopathy Patients from EHR Using Diagnostic Codes

AU - Farahani, Nasibeh Zanjirani

AU - Sundaram, Divaakar Siva Baala

AU - Enayati, Moein

AU - Arunachalam, Shivaram Poigai

AU - Pasupathy, Kalyan

AU - Arruda-Olson, Adelaide M.

PY - 2020/12/16

Y1 - 2020/12/16

N2 - Hypertrophic cardiomyopathy (HCM) is a genetic heart disease that is the leading cause of sudden cardiac death (SCD) in young adults. Despite the well-known risk factors and existing clinical practice guidelines, HCM patients are underdiagnosed and sub-optimally managed. Developing machine learning models on electronic health record (EHR) data can help in better diagnosis of HCM and thus improve hundreds of patient lives. Automated phenotyping using HCM billing codes has received limited attention in the literature with a small number of prior publications. In this paper, we propose a novel predictive model that helps physicians in making diagnostic decisions, by means of information learned from historical data of similar patients.We assembled a cohort of 11,562 patients with known or suspected HCM who have visited Mayo Clinic between the years 1995 to 2019. All existing billing codes of these patients were extracted from the EHR data warehouse. Target ground truth labeling for training the machine learning model was provided by confirmed HCM diagnosis using the gold standard imaging tests for HCM diagnosis echocardiography (echo), or cardiac magnetic resonance (CMR) imaging. As the result, patients were labeled into three categories of 'yes' definite HCM 'no' HCM phenotype and 'possible HCM' after a manual review of medical records and imaging tests.In this study, a random forest was adopted to investigate the predictive performance of billing codes for the identification of HCM patients due to its practical application and expected accuracy in a wide range of use cases. Our model performed well in finding patients with 'yes definite 'possible' and 'no' HCM with an accuracy of 71%, weighted recall of 70%, the precision of 75%, and weighted F1 score of 72%. Furthermore, we provided visualizations based on multidimensional scaling and the principal component analysis to provide insights for clinicians' interpretation. This model can be used for the identification of HCM patients using their EHR data, and help clinicians in their diagnosis decision making.

AB - Hypertrophic cardiomyopathy (HCM) is a genetic heart disease that is the leading cause of sudden cardiac death (SCD) in young adults. Despite the well-known risk factors and existing clinical practice guidelines, HCM patients are underdiagnosed and sub-optimally managed. Developing machine learning models on electronic health record (EHR) data can help in better diagnosis of HCM and thus improve hundreds of patient lives. Automated phenotyping using HCM billing codes has received limited attention in the literature with a small number of prior publications. In this paper, we propose a novel predictive model that helps physicians in making diagnostic decisions, by means of information learned from historical data of similar patients.We assembled a cohort of 11,562 patients with known or suspected HCM who have visited Mayo Clinic between the years 1995 to 2019. All existing billing codes of these patients were extracted from the EHR data warehouse. Target ground truth labeling for training the machine learning model was provided by confirmed HCM diagnosis using the gold standard imaging tests for HCM diagnosis echocardiography (echo), or cardiac magnetic resonance (CMR) imaging. As the result, patients were labeled into three categories of 'yes' definite HCM 'no' HCM phenotype and 'possible HCM' after a manual review of medical records and imaging tests.In this study, a random forest was adopted to investigate the predictive performance of billing codes for the identification of HCM patients due to its practical application and expected accuracy in a wide range of use cases. Our model performed well in finding patients with 'yes definite 'possible' and 'no' HCM with an accuracy of 71%, weighted recall of 70%, the precision of 75%, and weighted F1 score of 72%. Furthermore, we provided visualizations based on multidimensional scaling and the principal component analysis to provide insights for clinicians' interpretation. This model can be used for the identification of HCM patients using their EHR data, and help clinicians in their diagnosis decision making.

KW - billing code

KW - classification

KW - decision making

KW - diagnostic codes

KW - electronic health records (EHR)

KW - hypertrophic cardiomyopathy (HCM)

KW - machine learning

KW - random forest

UR - http://www.scopus.com/inward/record.url?scp=85100348544&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85100348544&partnerID=8YFLogxK

U2 - 10.1109/BIBM49941.2020.9313231

DO - 10.1109/BIBM49941.2020.9313231

M3 - Conference contribution

AN - SCOPUS:85100348544

T3 - Proceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020

SP - 1932

EP - 1937

BT - Proceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020

A2 - Park, Taesung

A2 - Cho, Young-Rae

A2 - Hu, Xiaohua Tony

A2 - Yoo, Illhoi

A2 - Woo, Hyun Goo

A2 - Wang, Jianxin

A2 - Facelli, Julio

A2 - Nam, Seungyoon

A2 - Kang, Mingon

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020

Y2 - 16 December 2020 through 19 December 2020

ER -

Explanatory Analysis of a Machine Learning Model to Identify Hypertrophic Cardiomyopathy Patients from EHR Using Diagnostic Codes

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this