Algorithm to Identify Type 2 Diabetes Using Electronic Health Record and Self-Reported Data

Ben T. Varghese, Marlene E. Girardo, Ruchi Gupta, Karen M. Fischer, Madison Duellman, Michelle M. Mielke, Aoife M. Egan, Janet E. Olson, Adrian Vella, Kent R. Bailey, Sagar B. Dugani

Research output: Contribution to journalArticlepeer-review

Abstract

Aims: Identifying participants with type 2 diabetes (T2D) based only on electronic health record (EHR) or self-reported data has limited accuracy. Therefore, the objective of the study was to develop an algorithm using EHR and self-reported data to identify participants with and without T2D. Methods: We included participants enrolled in the Mayo Clinic Biobank. At enrollment, participants completed a baseline questionnaire on health conditions, including T2D, and provided access to their EHR data. T2D status was based on self-report and EHR data (International Classification of Diseases codes, hemoglobin A1c [HbA1c], plasma glucose, and glucose-regulating medications) within 5 years prior to and 2 months after enrollment. Participants who self-reported T2D but lacked corroborating EHR data were categorized separately (“only self-reported T2D”). After identifying participants with T2D, we identified participants without T2D based on normal HbA1c and plasma glucose. Participants who self-reported the absence of T2D but lacked corroborating EHR data were categorized separately (“only self-reported no T2D”). Using manual chart reviews (gold standard), we calculated the positive and negative predictive values (NPV) to identify T2D. Results: Of 57,000 participants, the algorithm classified participants as having T2D (n = 6,238), no T2D (n = 38,883), “only self-reported T2D” (n = 757), and “only self-reported no-T2D” (n = 9,759). The algorithm had a high positive predictive value (96.0% [91.5%−98.5%]), NPV (100% [98.0%−100%]), and accuracy (99.5% [98.3%−99.8%]). Participant age (median [range]) ranged from 52 (18-98) years (only self-reported T2D) to 67 (19-99) years (T2D) (P < 0.0001), and the proportion of women ranged from 45.3% (T2D) to 69.6% (only self-reported no T2D) (P < 0.0001). Most participants were of the White race (84.0%−92.7%) and non-Hispanic ethnicity (97.6%−98.6%). Conclusions: In this study, we developed an algorithm to accurately identify participants with and without T2D, which may be generalizable to cohorts with linked EHR data.

Original languageEnglish (US)
Pages (from-to)186-192
Number of pages7
JournalMetabolic Syndrome and Related Disorders
Volume23
Issue number4
DOIs
StatePublished - May 1 2025

Keywords

  • algorithm
  • biobank
  • diabetes
  • electronic health record
  • self-report

ASJC Scopus subject areas

  • Internal Medicine
  • Endocrinology, Diabetes and Metabolism

Fingerprint

Dive into the research topics of 'Algorithm to Identify Type 2 Diabetes Using Electronic Health Record and Self-Reported Data'. Together they form a unique fingerprint.

Cite this