An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C)

National COVID Cohort Collaborative (N3C) Natural Language Processing (NLP) Subgroup, National COVID Cohort Collaborative (N3C)

doi:10.1093/jamia/ocad134

An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C)

National COVID Cohort Collaborative (N3C) Natural Language Processing (NLP) Subgroup, National COVID Cohort Collaborative (N3C)

Digital Health Sciences

Research output: Contribution to journal › Article › peer-review

Abstract

Despite recent methodology advancements in clinical natural language processing (NLP), the adoption of clinical NLP models within the translational research community remains hindered by process heterogeneity and human factor variations. Concurrently, these factors also dramatically increase the difficulty in developing NLP models in multi-site settings, which is necessary for algorithm robustness and generalizability. Here, we reported on our experience developing an NLP solution for Coronavirus Disease 2019 (COVID-19) signs and symptom extraction in an open NLP framework from a subset of sites participating in the National COVID Cohort (N3C). We then empirically highlight the benefits of multi-site data for both symbolic and statistical methods, as well as highlight the need for federated annotation and evaluation to resolve several pitfalls encountered in the course of these efforts.

Original language	English (US)
Pages (from-to)	2036-2040
Number of pages	5
Journal	Journal of the American Medical Informatics Association
Volume	30
Issue number	12
DOIs	https://doi.org/10.1093/jamia/ocad134
State	Published - Dec 1 2023

Keywords

electronic healthy records
federated learning
multi-institutional data annotation
natural language processing

ASJC Scopus subject areas

Health Informatics

Access to Document

10.1093/jamia/ocad134

Cite this

National COVID Cohort Collaborative (N3C) Natural Language Processing (NLP) Subgroup, National COVID Cohort Collaborative (N3C) (2023). An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C). Journal of the American Medical Informatics Association, 30(12), 2036-2040. https://doi.org/10.1093/jamia/ocad134

An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C). / National COVID Cohort Collaborative (N3C) Natural Language Processing (NLP) Subgroup, National COVID Cohort Collaborative (N3C).
In: Journal of the American Medical Informatics Association, Vol. 30, No. 12, 01.12.2023, p. 2036-2040.

Research output: Contribution to journal › Article › peer-review

National COVID Cohort Collaborative (N3C) Natural Language Processing (NLP) Subgroup, National COVID Cohort Collaborative (N3C) 2023, 'An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C)', Journal of the American Medical Informatics Association, vol. 30, no. 12, pp. 2036-2040. https://doi.org/10.1093/jamia/ocad134

National COVID Cohort Collaborative (N3C) Natural Language Processing (NLP) Subgroup, National COVID Cohort Collaborative (N3C). An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C). Journal of the American Medical Informatics Association. 2023 Dec 1;30(12):2036-2040. doi: 10.1093/jamia/ocad134

National COVID Cohort Collaborative (N3C) Natural Language Processing (NLP) Subgroup, National COVID Cohort Collaborative (N3C). / An open natural language processing (NLP) framework for EHR-based clinical research : a case demonstration using the National COVID Cohort Collaborative (N3C). In: Journal of the American Medical Informatics Association. 2023 ; Vol. 30, No. 12. pp. 2036-2040.

@article{205e3337a193403d94ba98fd33775b59,

title = "An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C)",

abstract = "Despite recent methodology advancements in clinical natural language processing (NLP), the adoption of clinical NLP models within the translational research community remains hindered by process heterogeneity and human factor variations. Concurrently, these factors also dramatically increase the difficulty in developing NLP models in multi-site settings, which is necessary for algorithm robustness and generalizability. Here, we reported on our experience developing an NLP solution for Coronavirus Disease 2019 (COVID-19) signs and symptom extraction in an open NLP framework from a subset of sites participating in the National COVID Cohort (N3C). We then empirically highlight the benefits of multi-site data for both symbolic and statistical methods, as well as highlight the need for federated annotation and evaluation to resolve several pitfalls encountered in the course of these efforts.",

keywords = "electronic healthy records, federated learning, multi-institutional data annotation, natural language processing",

author = "{National COVID Cohort Collaborative (N3C) Natural Language Processing (NLP) Subgroup, National COVID Cohort Collaborative (N3C)} and Sijia Liu and Andrew Wen and Liwei Wang and Huan He and Sunyang Fu and Robert Miller and Andrew Williams and Daniel Harris and Ramakanth Kavuluru and Mei Liu and Noor Abu-El-Rub and Dalton Schutte and Rui Zhang and Masoud Rouhizadeh and Osborne, {John D.} and Yongqun He and Umit Topaloglu and Hong, {Stephanie S.} and Saltz, {Joel H.} and Thomas Schaffter and Emily Pfaff and Chute, {Christopher G.} and Tim Duong and Haendel, {Melissa A.} and Rafael Fuentes and Peter Szolovits and Hua Xu and Hongfang Liu",

note = "Publisher Copyright: {\textcopyright} The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association.",

year = "2023",

month = dec,

day = "1",

doi = "10.1093/jamia/ocad134",

language = "English (US)",

volume = "30",

pages = "2036--2040",

journal = "Journal of the American Medical Informatics Association",

issn = "1067-5027",

publisher = "Oxford University Press",

number = "12",

}

TY - JOUR

T1 - An open natural language processing (NLP) framework for EHR-based clinical research

T2 - a case demonstration using the National COVID Cohort Collaborative (N3C)

AU - National COVID Cohort Collaborative (N3C) Natural Language Processing (NLP) Subgroup, National COVID Cohort Collaborative (N3C)

AU - Liu, Sijia

AU - Wen, Andrew

AU - Wang, Liwei

AU - He, Huan

AU - Fu, Sunyang

AU - Miller, Robert

AU - Williams, Andrew

AU - Harris, Daniel

AU - Kavuluru, Ramakanth

AU - Liu, Mei

AU - Abu-El-Rub, Noor

AU - Schutte, Dalton

AU - Zhang, Rui

AU - Rouhizadeh, Masoud

AU - Osborne, John D.

AU - He, Yongqun

AU - Topaloglu, Umit

AU - Hong, Stephanie S.

AU - Saltz, Joel H.

AU - Schaffter, Thomas

AU - Pfaff, Emily

AU - Chute, Christopher G.

AU - Duong, Tim

AU - Haendel, Melissa A.

AU - Fuentes, Rafael

AU - Szolovits, Peter

AU - Xu, Hua

AU - Liu, Hongfang

PY - 2023/12/1

Y1 - 2023/12/1

N2 - Despite recent methodology advancements in clinical natural language processing (NLP), the adoption of clinical NLP models within the translational research community remains hindered by process heterogeneity and human factor variations. Concurrently, these factors also dramatically increase the difficulty in developing NLP models in multi-site settings, which is necessary for algorithm robustness and generalizability. Here, we reported on our experience developing an NLP solution for Coronavirus Disease 2019 (COVID-19) signs and symptom extraction in an open NLP framework from a subset of sites participating in the National COVID Cohort (N3C). We then empirically highlight the benefits of multi-site data for both symbolic and statistical methods, as well as highlight the need for federated annotation and evaluation to resolve several pitfalls encountered in the course of these efforts.

AB - Despite recent methodology advancements in clinical natural language processing (NLP), the adoption of clinical NLP models within the translational research community remains hindered by process heterogeneity and human factor variations. Concurrently, these factors also dramatically increase the difficulty in developing NLP models in multi-site settings, which is necessary for algorithm robustness and generalizability. Here, we reported on our experience developing an NLP solution for Coronavirus Disease 2019 (COVID-19) signs and symptom extraction in an open NLP framework from a subset of sites participating in the National COVID Cohort (N3C). We then empirically highlight the benefits of multi-site data for both symbolic and statistical methods, as well as highlight the need for federated annotation and evaluation to resolve several pitfalls encountered in the course of these efforts.

KW - electronic healthy records

KW - federated learning

KW - multi-institutional data annotation

KW - natural language processing

UR - http://www.scopus.com/inward/record.url?scp=85185388095&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85185388095&partnerID=8YFLogxK

U2 - 10.1093/jamia/ocad134

DO - 10.1093/jamia/ocad134

M3 - Article

C2 - 37555837

AN - SCOPUS:85185388095

SN - 1067-5027

VL - 30

SP - 2036

EP - 2040

JO - Journal of the American Medical Informatics Association

JF - Journal of the American Medical Informatics Association

IS - 12

ER -

An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C)

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this