A fast, resource efficient, and reliable rule-based system for COVID-19 symptom identification

Himanshu S. Sahoo; Greg M. Silverman; Nicholas E. Ingraham; Monica I. Lupei; Michael A. Puskarich; Raymond L. Finzel; John Sartori; Rui Zhang; Benjamin C. Knoll; Sijia Liu; Hongfang Liu; Genevieve B. Melton; Christopher J. Tignanelli; Serguei V.S. Pakhomov

doi:10.1093/jamiaopen/ooab070

A fast, resource efficient, and reliable rule-based system for COVID-19 symptom identification

Himanshu S. Sahoo, Greg M. Silverman, Nicholas E. Ingraham, Monica I. Lupei, Michael A. Puskarich, Raymond L. Finzel, John Sartori, Rui Zhang, Benjamin C. Knoll, Sijia Liu, Hongfang Liu, Genevieve B. Melton, Christopher J. Tignanelli, Serguei V.S. Pakhomov

Digital Health Sciences

Research output: Contribution to journal › Article › peer-review

Abstract

Objective: With COVID-19, there was a need for a rapidly scalable annotation system that facilitated real-time integration with clinical decision support systems (CDS). Current annotation systems suffer from a high-resource utilization and poor scalability limiting real-world integration with CDS. A potential solution to mitigate these issues is to use the rule-based gazetteer developed at our institution. Materials and Methods: Performance, resource utilization, and runtime of the rule-based gazetteer were compared with five annotation systems: BioMedICUS, cTAKES, MetaMap, CLAMP, and MedTagger. Results: This rule-based gazetteer was the fastest, had a low resource footprint, and similar performance for weighted microaverage and macroaverage measures of precision, recall, and f1-score compared to other annotation systems. Discussion: Opportunities to increase its performance include fine-tuning lexical rules for symptom identification. Additionally, it could run on multiple compute nodes for faster runtime. Conclusion: This rule-based gazetteer overcame key technical limitations facilitating real-time symptomatology identification for COVID-19 and integration of unstructured data elements into our CDS. It is ideal for large-scale deployment across a wide variety of healthcare settings for surveillance of acute COVID-19 symptoms for integration into prognostic modeling. Such a system is currently being leveraged for monitoring of postacute sequelae of COVID-19 (PASC) progression in COVID-19 survivors. This study conducted the first in-depth analysis and developed a rule-based gazetteer for COVID-19 symptom extraction with the following key features: low processor and memory utilization, faster runtime, and similar weighted microaverage and macroaverage measures for precision, recall, and f1-score compared to industry-standard annotation systems.

Original language	English (US)
Article number	ooab070
Journal	JAMIA Open
Volume	4
Issue number	3
DOIs	https://doi.org/10.1093/jamiaopen/ooab070
State	Published - Jul 1 2021

Keywords

Natural language processing
and symptoms
artificial intelligence
clinical decision support systems
follow-up studies
information extraction
signs

ASJC Scopus subject areas

Health Informatics

Access to Document

10.1093/jamiaopen/ooab070

Cite this

Sahoo, H. S., Silverman, G. M., Ingraham, N. E., Lupei, M. I., Puskarich, M. A., Finzel, R. L., Sartori, J., Zhang, R., Knoll, B. C., Liu, S., Liu, H., Melton, G. B., Tignanelli, C. J., & Pakhomov, S. V. S. (2021). A fast, resource efficient, and reliable rule-based system for COVID-19 symptom identification. JAMIA Open, 4(3), Article ooab070. https://doi.org/10.1093/jamiaopen/ooab070

@article{ab461ae947864fdc83244bfd460cc26a,

title = "A fast, resource efficient, and reliable rule-based system for COVID-19 symptom identification",

abstract = "Objective: With COVID-19, there was a need for a rapidly scalable annotation system that facilitated real-time integration with clinical decision support systems (CDS). Current annotation systems suffer from a high-resource utilization and poor scalability limiting real-world integration with CDS. A potential solution to mitigate these issues is to use the rule-based gazetteer developed at our institution. Materials and Methods: Performance, resource utilization, and runtime of the rule-based gazetteer were compared with five annotation systems: BioMedICUS, cTAKES, MetaMap, CLAMP, and MedTagger. Results: This rule-based gazetteer was the fastest, had a low resource footprint, and similar performance for weighted microaverage and macroaverage measures of precision, recall, and f1-score compared to other annotation systems. Discussion: Opportunities to increase its performance include fine-tuning lexical rules for symptom identification. Additionally, it could run on multiple compute nodes for faster runtime. Conclusion: This rule-based gazetteer overcame key technical limitations facilitating real-time symptomatology identification for COVID-19 and integration of unstructured data elements into our CDS. It is ideal for large-scale deployment across a wide variety of healthcare settings for surveillance of acute COVID-19 symptoms for integration into prognostic modeling. Such a system is currently being leveraged for monitoring of postacute sequelae of COVID-19 (PASC) progression in COVID-19 survivors. This study conducted the first in-depth analysis and developed a rule-based gazetteer for COVID-19 symptom extraction with the following key features: low processor and memory utilization, faster runtime, and similar weighted microaverage and macroaverage measures for precision, recall, and f1-score compared to industry-standard annotation systems.",

keywords = "Natural language processing, and symptoms, artificial intelligence, clinical decision support systems, follow-up studies, information extraction, signs",

author = "Sahoo, {Himanshu S.} and Silverman, {Greg M.} and Ingraham, {Nicholas E.} and Lupei, {Monica I.} and Puskarich, {Michael A.} and Finzel, {Raymond L.} and John Sartori and Rui Zhang and Knoll, {Benjamin C.} and Sijia Liu and Hongfang Liu and Melton, {Genevieve B.} and Tignanelli, {Christopher J.} and Pakhomov, {Serguei V.S.}",

note = "Publisher Copyright: {\textcopyright} 2021 The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association.",

year = "2021",

month = jul,

day = "1",

doi = "10.1093/jamiaopen/ooab070",

language = "English (US)",

volume = "4",

journal = "JAMIA Open",

issn = "2574-2531",

publisher = "Oxford University Press",

number = "3",

}

TY - JOUR

T1 - A fast, resource efficient, and reliable rule-based system for COVID-19 symptom identification

AU - Sahoo, Himanshu S.

AU - Silverman, Greg M.

AU - Ingraham, Nicholas E.

AU - Lupei, Monica I.

AU - Puskarich, Michael A.

AU - Finzel, Raymond L.

AU - Sartori, John

AU - Zhang, Rui

AU - Knoll, Benjamin C.

AU - Liu, Sijia

AU - Liu, Hongfang

AU - Melton, Genevieve B.

AU - Tignanelli, Christopher J.

AU - Pakhomov, Serguei V.S.

PY - 2021/7/1

Y1 - 2021/7/1

N2 - Objective: With COVID-19, there was a need for a rapidly scalable annotation system that facilitated real-time integration with clinical decision support systems (CDS). Current annotation systems suffer from a high-resource utilization and poor scalability limiting real-world integration with CDS. A potential solution to mitigate these issues is to use the rule-based gazetteer developed at our institution. Materials and Methods: Performance, resource utilization, and runtime of the rule-based gazetteer were compared with five annotation systems: BioMedICUS, cTAKES, MetaMap, CLAMP, and MedTagger. Results: This rule-based gazetteer was the fastest, had a low resource footprint, and similar performance for weighted microaverage and macroaverage measures of precision, recall, and f1-score compared to other annotation systems. Discussion: Opportunities to increase its performance include fine-tuning lexical rules for symptom identification. Additionally, it could run on multiple compute nodes for faster runtime. Conclusion: This rule-based gazetteer overcame key technical limitations facilitating real-time symptomatology identification for COVID-19 and integration of unstructured data elements into our CDS. It is ideal for large-scale deployment across a wide variety of healthcare settings for surveillance of acute COVID-19 symptoms for integration into prognostic modeling. Such a system is currently being leveraged for monitoring of postacute sequelae of COVID-19 (PASC) progression in COVID-19 survivors. This study conducted the first in-depth analysis and developed a rule-based gazetteer for COVID-19 symptom extraction with the following key features: low processor and memory utilization, faster runtime, and similar weighted microaverage and macroaverage measures for precision, recall, and f1-score compared to industry-standard annotation systems.

AB - Objective: With COVID-19, there was a need for a rapidly scalable annotation system that facilitated real-time integration with clinical decision support systems (CDS). Current annotation systems suffer from a high-resource utilization and poor scalability limiting real-world integration with CDS. A potential solution to mitigate these issues is to use the rule-based gazetteer developed at our institution. Materials and Methods: Performance, resource utilization, and runtime of the rule-based gazetteer were compared with five annotation systems: BioMedICUS, cTAKES, MetaMap, CLAMP, and MedTagger. Results: This rule-based gazetteer was the fastest, had a low resource footprint, and similar performance for weighted microaverage and macroaverage measures of precision, recall, and f1-score compared to other annotation systems. Discussion: Opportunities to increase its performance include fine-tuning lexical rules for symptom identification. Additionally, it could run on multiple compute nodes for faster runtime. Conclusion: This rule-based gazetteer overcame key technical limitations facilitating real-time symptomatology identification for COVID-19 and integration of unstructured data elements into our CDS. It is ideal for large-scale deployment across a wide variety of healthcare settings for surveillance of acute COVID-19 symptoms for integration into prognostic modeling. Such a system is currently being leveraged for monitoring of postacute sequelae of COVID-19 (PASC) progression in COVID-19 survivors. This study conducted the first in-depth analysis and developed a rule-based gazetteer for COVID-19 symptom extraction with the following key features: low processor and memory utilization, faster runtime, and similar weighted microaverage and macroaverage measures for precision, recall, and f1-score compared to industry-standard annotation systems.

KW - Natural language processing

KW - and symptoms

KW - artificial intelligence

KW - clinical decision support systems

KW - follow-up studies

KW - information extraction

KW - signs

UR - http://www.scopus.com/inward/record.url?scp=85118120384&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85118120384&partnerID=8YFLogxK

U2 - 10.1093/jamiaopen/ooab070

DO - 10.1093/jamiaopen/ooab070

M3 - Article

AN - SCOPUS:85118120384

SN - 2574-2531

VL - 4

JO - JAMIA Open

JF - JAMIA Open

IS - 3

M1 - ooab070

ER -

A fast, resource efficient, and reliable rule-based system for COVID-19 symptom identification

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this