TY - JOUR
T1 - Developing a scalable FHIR-based clinical data normalization pipeline for standardizing and integrating unstructured and structured electronic health record data
AU - Hong, Na
AU - Wen, Andrew
AU - Shen, Feichen
AU - Sohn, Sunghwan
AU - Wang, Chen
AU - Liu, Hongfang
AU - Jiang, Guoqian
N1 - Publisher Copyright:
© The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
PY - 2019/12/1
Y1 - 2019/12/1
N2 - Objective: To design, develop, and evaluate a scalable clinical data normalization pipeline for standardizing unstructured electronic health record (EHR) data leveraging the HL7 Fast Healthcare Interoperability Resources (FHIR) specification. Methods: We established an FHIR-based clinical data normalization pipeline known as NLP2FHIR that mainly comprises: (1) a module for a core natural language processing (NLP) engine with an FHIR-based type system; (2) a module for integrating structured data; and (3) a module for content normalization. We evaluated the FHIR modeling capability focusing on core clinical resources such as Condition, Procedure, MedicationStatement (including Medication), and FamilyMemberHistory using Mayo Clinic's unstructured EHR data. We constructed a gold standard reusing annotation corpora from previous NLP projects. Results: A total of 30 mapping rules, 62 normalization rules, and 11 NLP-specific FHIR extensions were created and implemented in the NLP2FHIR pipeline. The elements that need to integrate structured data from each clinical resource were identified. The performance of unstructured data modeling achieved F scores ranging from 0.69 to 0.99 for various FHIR element representations (0.69-0.99 for Condition; 0.75-0.84 for Procedure; 0.71-0.99 for MedicationStatement; and 0.75-0.95 for FamilyMemberHistory). Conclusion: We demonstrated that the NLP2FHIR pipeline is feasible for modeling unstructured EHR data and integrating structured elements into the model. The outcomes of this work provide standards-based tools of clinical data normalization that is indispensable for enabling portable EHR-driven phenotyping and large-scale data analytics, as well as useful insights for future developments of the FHIR specifications with regard to handling unstructured clinical data.
AB - Objective: To design, develop, and evaluate a scalable clinical data normalization pipeline for standardizing unstructured electronic health record (EHR) data leveraging the HL7 Fast Healthcare Interoperability Resources (FHIR) specification. Methods: We established an FHIR-based clinical data normalization pipeline known as NLP2FHIR that mainly comprises: (1) a module for a core natural language processing (NLP) engine with an FHIR-based type system; (2) a module for integrating structured data; and (3) a module for content normalization. We evaluated the FHIR modeling capability focusing on core clinical resources such as Condition, Procedure, MedicationStatement (including Medication), and FamilyMemberHistory using Mayo Clinic's unstructured EHR data. We constructed a gold standard reusing annotation corpora from previous NLP projects. Results: A total of 30 mapping rules, 62 normalization rules, and 11 NLP-specific FHIR extensions were created and implemented in the NLP2FHIR pipeline. The elements that need to integrate structured data from each clinical resource were identified. The performance of unstructured data modeling achieved F scores ranging from 0.69 to 0.99 for various FHIR element representations (0.69-0.99 for Condition; 0.75-0.84 for Procedure; 0.71-0.99 for MedicationStatement; and 0.75-0.95 for FamilyMemberHistory). Conclusion: We demonstrated that the NLP2FHIR pipeline is feasible for modeling unstructured EHR data and integrating structured elements into the model. The outcomes of this work provide standards-based tools of clinical data normalization that is indispensable for enabling portable EHR-driven phenotyping and large-scale data analytics, as well as useful insights for future developments of the FHIR specifications with regard to handling unstructured clinical data.
KW - Data standards
KW - Electronic health records
KW - Fast Healthcare Interoperability Resources
KW - Natural language process
UR - http://www.scopus.com/inward/record.url?scp=85087152285&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85087152285&partnerID=8YFLogxK
U2 - 10.1093/jamiaopen/ooz056
DO - 10.1093/jamiaopen/ooz056
M3 - Article
AN - SCOPUS:85087152285
SN - 2574-2531
VL - 2
SP - 570
EP - 579
JO - JAMIA Open
JF - JAMIA Open
IS - 4
ER -