A fast, resource efficient, and reliable rule-based system for COVID-19 symptom identification

Himanshu S. Sahoo, Greg M. Silverman, Nicholas E. Ingraham, Monica I. Lupei, Michael A. Puskarich, Raymond L. Finzel, John Sartori, Rui Zhang, Benjamin C. Knoll, Sijia Liu, Hongfang Liu, Genevieve B. Melton, Christopher J. Tignanelli, Serguei V.S. Pakhomov

Research output: Contribution to journalArticlepeer-review


Objective: With COVID-19, there was a need for a rapidly scalable annotation system that facilitated real-time integration with clinical decision support systems (CDS). Current annotation systems suffer from a high-resource utilization and poor scalability limiting real-world integration with CDS. A potential solution to mitigate these issues is to use the rule-based gazetteer developed at our institution. Materials and Methods: Performance, resource utilization, and runtime of the rule-based gazetteer were compared with five annotation systems: BioMedICUS, cTAKES, MetaMap, CLAMP, and MedTagger. Results: This rule-based gazetteer was the fastest, had a low resource footprint, and similar performance for weighted microaverage and macroaverage measures of precision, recall, and f1-score compared to other annotation systems. Discussion: Opportunities to increase its performance include fine-tuning lexical rules for symptom identification. Additionally, it could run on multiple compute nodes for faster runtime. Conclusion: This rule-based gazetteer overcame key technical limitations facilitating real-time symptomatology identification for COVID-19 and integration of unstructured data elements into our CDS. It is ideal for large-scale deployment across a wide variety of healthcare settings for surveillance of acute COVID-19 symptoms for integration into prognostic modeling. Such a system is currently being leveraged for monitoring of postacute sequelae of COVID-19 (PASC) progression in COVID-19 survivors. This study conducted the first in-depth analysis and developed a rule-based gazetteer for COVID-19 symptom extraction with the following key features: low processor and memory utilization, faster runtime, and similar weighted microaverage and macroaverage measures for precision, recall, and f1-score compared to industry-standard annotation systems.

Original languageEnglish (US)
Article numberooab070
JournalJAMIA Open
Issue number3
StatePublished - Jul 1 2021


  • Natural language processing
  • and symptoms
  • artificial intelligence
  • clinical decision support systems
  • follow-up studies
  • information extraction
  • signs

ASJC Scopus subject areas

  • Health Informatics


Dive into the research topics of 'A fast, resource efficient, and reliable rule-based system for COVID-19 symptom identification'. Together they form a unique fingerprint.

Cite this