Identification of asthma control factor in clinical notes using a hybrid deep learning model

Bhavani Singh Agnikula Kshatriya; Elham Sagheb; Chung Il Wi; Jungwon Yoon; Hee Yun Seol; Young Juhn; Sunghwan Sohn

doi:10.1186/s12911-021-01633-4

Identification of asthma control factor in clinical notes using a hybrid deep learning model

Bhavani Singh Agnikula Kshatriya, Elham Sagheb, Chung Il Wi, Jungwon Yoon, Hee Yun Seol, Young Juhn, Sunghwan Sohn

Research output: Contribution to journal › Article › peer-review

Abstract

Background: There are significant variabilities in guideline-concordant documentation in asthma care. However, assessing clinician’s documentation is not feasible using only structured data but requires labor-intensive chart review of electronic health records (EHRs). A certain guideline element in asthma control factors, such as review inhaler techniques, requires context understanding to correctly capture from EHR free text. Methods: The study data consist of two sets: (1) manual chart reviewed data—1039 clinical notes of 300 patients with asthma diagnosis, and (2) weakly labeled data (distant supervision)—27,363 clinical notes from 800 patients with asthma diagnosis. A context-aware language model, Bidirectional Encoder Representations from Transformers (BERT) was developed to identify inhaler techniques in EHR free text. Both original BERT and clinical BioBERT (cBERT) were applied with a cost-sensitivity to deal with imbalanced data. The distant supervision using weak labels by rules was also incorporated to augment the training set and alleviate a costly manual labeling process in the development of a deep learning algorithm. A hybrid approach using post-hoc rules was also explored to fix BERT model errors. The performance of BERT with/without distant supervision, hybrid, and rule-based models were compared in precision, recall, F-score, and accuracy. Results: The BERT models on the original data performed similar to a rule-based model in F1-score (0.837, 0.845, and 0.838 for rules, BERT, and cBERT, respectively). The BERT models with distant supervision produced higher performance (0.853 and 0.880 for BERT and cBERT, respectively) than without distant supervision and a rule-based model. The hybrid models performed best in F1-score of 0.877 and 0.904 over the distant supervision on BERT and cBERT. Conclusions: The proposed BERT models with distant supervision demonstrated its capability to identify inhaler techniques in EHR free text, and outperformed both the rule-based model and BERT models trained on the original data. With a distant supervision approach, we may alleviate costly manual chart review to generate the large training data required in most deep learning-based models. A hybrid model was able to fix BERT model errors and further improve the performance.

Original language	English (US)
Article number	272
Journal	BMC Medical Informatics and Decision Making
Volume	21
DOIs	https://doi.org/10.1186/s12911-021-01633-4
State	Published - Nov 2021

Keywords

Adherence to asthma guidelines
Context-aware language model
Deep learning
Documentation variations
Inhaler technique
Natural language processing

ASJC Scopus subject areas

Health Policy
Health Informatics
Computer Science Applications

Access to Document

10.1186/s12911-021-01633-4

Cite this

@article{86c9d8f1a2244961a3c8c7e9912dbabf,

title = "Identification of asthma control factor in clinical notes using a hybrid deep learning model",

abstract = "Background: There are significant variabilities in guideline-concordant documentation in asthma care. However, assessing clinician{\textquoteright}s documentation is not feasible using only structured data but requires labor-intensive chart review of electronic health records (EHRs). A certain guideline element in asthma control factors, such as review inhaler techniques, requires context understanding to correctly capture from EHR free text. Methods: The study data consist of two sets: (1) manual chart reviewed data—1039 clinical notes of 300 patients with asthma diagnosis, and (2) weakly labeled data (distant supervision)—27,363 clinical notes from 800 patients with asthma diagnosis. A context-aware language model, Bidirectional Encoder Representations from Transformers (BERT) was developed to identify inhaler techniques in EHR free text. Both original BERT and clinical BioBERT (cBERT) were applied with a cost-sensitivity to deal with imbalanced data. The distant supervision using weak labels by rules was also incorporated to augment the training set and alleviate a costly manual labeling process in the development of a deep learning algorithm. A hybrid approach using post-hoc rules was also explored to fix BERT model errors. The performance of BERT with/without distant supervision, hybrid, and rule-based models were compared in precision, recall, F-score, and accuracy. Results: The BERT models on the original data performed similar to a rule-based model in F1-score (0.837, 0.845, and 0.838 for rules, BERT, and cBERT, respectively). The BERT models with distant supervision produced higher performance (0.853 and 0.880 for BERT and cBERT, respectively) than without distant supervision and a rule-based model. The hybrid models performed best in F1-score of 0.877 and 0.904 over the distant supervision on BERT and cBERT. Conclusions: The proposed BERT models with distant supervision demonstrated its capability to identify inhaler techniques in EHR free text, and outperformed both the rule-based model and BERT models trained on the original data. With a distant supervision approach, we may alleviate costly manual chart review to generate the large training data required in most deep learning-based models. A hybrid model was able to fix BERT model errors and further improve the performance.",

keywords = "Adherence to asthma guidelines, Context-aware language model, Deep learning, Documentation variations, Inhaler technique, Natural language processing",

author = "{Agnikula Kshatriya}, {Bhavani Singh} and Elham Sagheb and Wi, {Chung Il} and Jungwon Yoon and Seol, {Hee Yun} and Young Juhn and Sunghwan Sohn",

note = "Publisher Copyright: {\textcopyright} 2021, The Author(s).",

year = "2021",

month = nov,

doi = "10.1186/s12911-021-01633-4",

language = "English (US)",

volume = "21",

journal = "BMC Medical Informatics and Decision Making",

issn = "1472-6947",

publisher = "BioMed Central",

}

TY - JOUR

T1 - Identification of asthma control factor in clinical notes using a hybrid deep learning model

AU - Agnikula Kshatriya, Bhavani Singh

AU - Sagheb, Elham

AU - Wi, Chung Il

AU - Yoon, Jungwon

AU - Seol, Hee Yun

AU - Juhn, Young

AU - Sohn, Sunghwan

PY - 2021/11

Y1 - 2021/11

N2 - Background: There are significant variabilities in guideline-concordant documentation in asthma care. However, assessing clinician’s documentation is not feasible using only structured data but requires labor-intensive chart review of electronic health records (EHRs). A certain guideline element in asthma control factors, such as review inhaler techniques, requires context understanding to correctly capture from EHR free text. Methods: The study data consist of two sets: (1) manual chart reviewed data—1039 clinical notes of 300 patients with asthma diagnosis, and (2) weakly labeled data (distant supervision)—27,363 clinical notes from 800 patients with asthma diagnosis. A context-aware language model, Bidirectional Encoder Representations from Transformers (BERT) was developed to identify inhaler techniques in EHR free text. Both original BERT and clinical BioBERT (cBERT) were applied with a cost-sensitivity to deal with imbalanced data. The distant supervision using weak labels by rules was also incorporated to augment the training set and alleviate a costly manual labeling process in the development of a deep learning algorithm. A hybrid approach using post-hoc rules was also explored to fix BERT model errors. The performance of BERT with/without distant supervision, hybrid, and rule-based models were compared in precision, recall, F-score, and accuracy. Results: The BERT models on the original data performed similar to a rule-based model in F1-score (0.837, 0.845, and 0.838 for rules, BERT, and cBERT, respectively). The BERT models with distant supervision produced higher performance (0.853 and 0.880 for BERT and cBERT, respectively) than without distant supervision and a rule-based model. The hybrid models performed best in F1-score of 0.877 and 0.904 over the distant supervision on BERT and cBERT. Conclusions: The proposed BERT models with distant supervision demonstrated its capability to identify inhaler techniques in EHR free text, and outperformed both the rule-based model and BERT models trained on the original data. With a distant supervision approach, we may alleviate costly manual chart review to generate the large training data required in most deep learning-based models. A hybrid model was able to fix BERT model errors and further improve the performance.

AB - Background: There are significant variabilities in guideline-concordant documentation in asthma care. However, assessing clinician’s documentation is not feasible using only structured data but requires labor-intensive chart review of electronic health records (EHRs). A certain guideline element in asthma control factors, such as review inhaler techniques, requires context understanding to correctly capture from EHR free text. Methods: The study data consist of two sets: (1) manual chart reviewed data—1039 clinical notes of 300 patients with asthma diagnosis, and (2) weakly labeled data (distant supervision)—27,363 clinical notes from 800 patients with asthma diagnosis. A context-aware language model, Bidirectional Encoder Representations from Transformers (BERT) was developed to identify inhaler techniques in EHR free text. Both original BERT and clinical BioBERT (cBERT) were applied with a cost-sensitivity to deal with imbalanced data. The distant supervision using weak labels by rules was also incorporated to augment the training set and alleviate a costly manual labeling process in the development of a deep learning algorithm. A hybrid approach using post-hoc rules was also explored to fix BERT model errors. The performance of BERT with/without distant supervision, hybrid, and rule-based models were compared in precision, recall, F-score, and accuracy. Results: The BERT models on the original data performed similar to a rule-based model in F1-score (0.837, 0.845, and 0.838 for rules, BERT, and cBERT, respectively). The BERT models with distant supervision produced higher performance (0.853 and 0.880 for BERT and cBERT, respectively) than without distant supervision and a rule-based model. The hybrid models performed best in F1-score of 0.877 and 0.904 over the distant supervision on BERT and cBERT. Conclusions: The proposed BERT models with distant supervision demonstrated its capability to identify inhaler techniques in EHR free text, and outperformed both the rule-based model and BERT models trained on the original data. With a distant supervision approach, we may alleviate costly manual chart review to generate the large training data required in most deep learning-based models. A hybrid model was able to fix BERT model errors and further improve the performance.

KW - Adherence to asthma guidelines

KW - Context-aware language model

KW - Deep learning

KW - Documentation variations

KW - Inhaler technique

KW - Natural language processing

UR - http://www.scopus.com/inward/record.url?scp=85118773666&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85118773666&partnerID=8YFLogxK

U2 - 10.1186/s12911-021-01633-4

DO - 10.1186/s12911-021-01633-4

M3 - Article

C2 - 34753481

AN - SCOPUS:85118773666

SN - 1472-6947

VL - 21

JO - BMC Medical Informatics and Decision Making

JF - BMC Medical Informatics and Decision Making

M1 - 272

ER -

Identification of asthma control factor in clinical notes using a hybrid deep learning model

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this