A multicenter evaluation of computable phenotyping approaches for SARS-CoV-2 infection and COVID-19 hospitalizations

Rohan Khera, Bobak J. Mortazavi, Veer Sangha, Frederick Warner, H. Patrick Young, Joseph S. Ross, Nilay D. Shah, Elitza S. Theel, William G. Jenkinson, Camille Knepper, Karen Wang, David Peaper, Richard A. Martinello, Cynthia A. Brandt, Zhenqiu Lin, Albert I. Ko, Harlan M. Krumholz, Benjamin D. Pollock, Wade L. Schulz

Research output: Contribution to journalArticlepeer-review


Diagnosis codes are used to study SARS-CoV2 infections and COVID-19 hospitalizations in administrative and electronic health record (EHR) data. Using EHR data (April 2020–March 2021) at the Yale-New Haven Health System and the three hospital systems of the Mayo Clinic, computable phenotype definitions based on ICD-10 diagnosis of COVID-19 (U07.1) were evaluated against positive SARS-CoV-2 PCR or antigen tests. We included 69,423 patients at Yale and 75,748 at Mayo Clinic with either a diagnosis code or a positive SARS-CoV-2 test. The precision and recall of a COVID-19 diagnosis for a positive test were 68.8% and 83.3%, respectively, at Yale, with higher precision (95%) and lower recall (63.5%) at Mayo Clinic, varying between 59.2% in Rochester to 97.3% in Arizona. For hospitalizations with a principal COVID-19 diagnosis, 94.8% at Yale and 80.5% at Mayo Clinic had an associated positive laboratory test, with secondary diagnosis of COVID-19 identifying additional patients. These patients had a twofold higher inhospital mortality than based on principal diagnosis. Standardization of coding practices is needed before the use of diagnosis codes in clinical research and epidemiological surveillance of COVID-19.

Original languageEnglish (US)
Article number27
Journalnpj Digital Medicine
Issue number1
StatePublished - Dec 2022

ASJC Scopus subject areas

  • Medicine (miscellaneous)
  • Health Informatics
  • Computer Science Applications
  • Health Information Management


Dive into the research topics of 'A multicenter evaluation of computable phenotyping approaches for SARS-CoV-2 infection and COVID-19 hospitalizations'. Together they form a unique fingerprint.

Cite this