A corpus-driven standardization framework for encoding clinical problems with HL7 FHIR

Kevin J. Peterson; Guoqian Jiang; Hongfang Liu

doi:10.1016/j.jbi.2020.103541

A corpus-driven standardization framework for encoding clinical problems with HL7 FHIR

Kevin J. Peterson, Guoqian Jiang, Hongfang Liu

Research output: Contribution to journal › Article › peer-review

2 Scopus citations

Abstract

Free-text problem descriptions are brief explanations of patient diagnoses and issues, commonly found in problem lists and other prominent areas of the medical record. These compact representations often express complex and nuanced medical conditions, making their semantics challenging to fully capture and standardize. In this study, we describe a framework for transforming free-text problem descriptions into standardized Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR) models. This approach leverages a combination of domain-specific dependency parsers, Bidirectional Encoder Representations from Transformers (BERT) natural language models, and cui2vec Unified Medical Language System (UMLS) concept vectors to align extracted concepts from free-text problem descriptions into structured FHIR models. A neural network classification model is used to classify thirteen relationship types between concepts, facilitating mapping to the FHIR Condition resource. We use data programming, a weak supervision approach, to eliminate the need for a manually annotated training corpus. Shapley values, a mechanism to quantify contribution, are used to interpret the impact of model features. We found that our methods identified the focus concept, or primary clinical concern of the problem description, with an F₁ score of 0.95. Relationships from the focus to other modifying concepts were extracted with an F₁ score of 0.90. When classifying relationships, our model achieved a 0.89 weighted average F₁ score, enabling accurate mapping of attributes into HL7 FHIR models. We also found that the BERT input representation predominantly contributed to the classifier decision as shown by the Shapley values analysis.

Original language	English (US)
Article number	103541
Journal	Journal of Biomedical Informatics
Volume	110
DOIs	https://doi.org/10.1016/j.jbi.2020.103541
State	Published - Oct 2020

Keywords

Deep Learning (D000077321)
Health Information Interoperability (D000073892)
Natural Language Processing (D009323)
Semantics (D012660)
Systematized Nomenclature of Medicine (D039061)

ASJC Scopus subject areas

Health Informatics
Computer Science Applications

Access to Document

10.1016/j.jbi.2020.103541

Cite this

@article{ba85ee1f95ff4fe5aab94b247b8422c6,

title = "A corpus-driven standardization framework for encoding clinical problems with HL7 FHIR",

abstract = "Free-text problem descriptions are brief explanations of patient diagnoses and issues, commonly found in problem lists and other prominent areas of the medical record. These compact representations often express complex and nuanced medical conditions, making their semantics challenging to fully capture and standardize. In this study, we describe a framework for transforming free-text problem descriptions into standardized Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR) models. This approach leverages a combination of domain-specific dependency parsers, Bidirectional Encoder Representations from Transformers (BERT) natural language models, and cui2vec Unified Medical Language System (UMLS) concept vectors to align extracted concepts from free-text problem descriptions into structured FHIR models. A neural network classification model is used to classify thirteen relationship types between concepts, facilitating mapping to the FHIR Condition resource. We use data programming, a weak supervision approach, to eliminate the need for a manually annotated training corpus. Shapley values, a mechanism to quantify contribution, are used to interpret the impact of model features. We found that our methods identified the focus concept, or primary clinical concern of the problem description, with an F1 score of 0.95. Relationships from the focus to other modifying concepts were extracted with an F1 score of 0.90. When classifying relationships, our model achieved a 0.89 weighted average F1 score, enabling accurate mapping of attributes into HL7 FHIR models. We also found that the BERT input representation predominantly contributed to the classifier decision as shown by the Shapley values analysis.",

keywords = "Deep Learning (D000077321), Health Information Interoperability (D000073892), Natural Language Processing (D009323), Semantics (D012660), Systematized Nomenclature of Medicine (D039061)",

author = "Peterson, {Kevin J.} and Guoqian Jiang and Hongfang Liu",

note = "Publisher Copyright: {\textcopyright} 2020 Elsevier Inc.",

year = "2020",

month = oct,

doi = "10.1016/j.jbi.2020.103541",

language = "English (US)",

volume = "110",

journal = "Journal of Biomedical Informatics",

issn = "1532-0464",

publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - A corpus-driven standardization framework for encoding clinical problems with HL7 FHIR

AU - Peterson, Kevin J.

AU - Jiang, Guoqian

AU - Liu, Hongfang

PY - 2020/10

Y1 - 2020/10

N2 - Free-text problem descriptions are brief explanations of patient diagnoses and issues, commonly found in problem lists and other prominent areas of the medical record. These compact representations often express complex and nuanced medical conditions, making their semantics challenging to fully capture and standardize. In this study, we describe a framework for transforming free-text problem descriptions into standardized Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR) models. This approach leverages a combination of domain-specific dependency parsers, Bidirectional Encoder Representations from Transformers (BERT) natural language models, and cui2vec Unified Medical Language System (UMLS) concept vectors to align extracted concepts from free-text problem descriptions into structured FHIR models. A neural network classification model is used to classify thirteen relationship types between concepts, facilitating mapping to the FHIR Condition resource. We use data programming, a weak supervision approach, to eliminate the need for a manually annotated training corpus. Shapley values, a mechanism to quantify contribution, are used to interpret the impact of model features. We found that our methods identified the focus concept, or primary clinical concern of the problem description, with an F1 score of 0.95. Relationships from the focus to other modifying concepts were extracted with an F1 score of 0.90. When classifying relationships, our model achieved a 0.89 weighted average F1 score, enabling accurate mapping of attributes into HL7 FHIR models. We also found that the BERT input representation predominantly contributed to the classifier decision as shown by the Shapley values analysis.

AB - Free-text problem descriptions are brief explanations of patient diagnoses and issues, commonly found in problem lists and other prominent areas of the medical record. These compact representations often express complex and nuanced medical conditions, making their semantics challenging to fully capture and standardize. In this study, we describe a framework for transforming free-text problem descriptions into standardized Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR) models. This approach leverages a combination of domain-specific dependency parsers, Bidirectional Encoder Representations from Transformers (BERT) natural language models, and cui2vec Unified Medical Language System (UMLS) concept vectors to align extracted concepts from free-text problem descriptions into structured FHIR models. A neural network classification model is used to classify thirteen relationship types between concepts, facilitating mapping to the FHIR Condition resource. We use data programming, a weak supervision approach, to eliminate the need for a manually annotated training corpus. Shapley values, a mechanism to quantify contribution, are used to interpret the impact of model features. We found that our methods identified the focus concept, or primary clinical concern of the problem description, with an F1 score of 0.95. Relationships from the focus to other modifying concepts were extracted with an F1 score of 0.90. When classifying relationships, our model achieved a 0.89 weighted average F1 score, enabling accurate mapping of attributes into HL7 FHIR models. We also found that the BERT input representation predominantly contributed to the classifier decision as shown by the Shapley values analysis.

KW - Deep Learning (D000077321)

KW - Health Information Interoperability (D000073892)

KW - Natural Language Processing (D009323)

KW - Semantics (D012660)

KW - Systematized Nomenclature of Medicine (D039061)

UR - http://www.scopus.com/inward/record.url?scp=85090064244&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85090064244&partnerID=8YFLogxK

U2 - 10.1016/j.jbi.2020.103541

DO - 10.1016/j.jbi.2020.103541

M3 - Article

C2 - 32814201

AN - SCOPUS:85090064244

SN - 1532-0464

VL - 110

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

M1 - 103541

ER -

A corpus-driven standardization framework for encoding clinical problems with HL7 FHIR

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this