TY - JOUR
T1 - Algorithm to Identify Type 2 Diabetes Using Electronic Health Record and Self-Reported Data
AU - Varghese, Ben T.
AU - Girardo, Marlene E.
AU - Gupta, Ruchi
AU - Fischer, Karen M.
AU - Duellman, Madison
AU - Mielke, Michelle M.
AU - Egan, Aoife M.
AU - Olson, Janet E.
AU - Vella, Adrian
AU - Bailey, Kent R.
AU - Dugani, Sagar B.
N1 - Publisher Copyright:
Copyright 2025, Mary Ann Liebert, Inc., publishers.
PY - 2025/5/1
Y1 - 2025/5/1
N2 - Aims: Identifying participants with type 2 diabetes (T2D) based only on electronic health record (EHR) or self-reported data has limited accuracy. Therefore, the objective of the study was to develop an algorithm using EHR and self-reported data to identify participants with and without T2D. Methods: We included participants enrolled in the Mayo Clinic Biobank. At enrollment, participants completed a baseline questionnaire on health conditions, including T2D, and provided access to their EHR data. T2D status was based on self-report and EHR data (International Classification of Diseases codes, hemoglobin A1c [HbA1c], plasma glucose, and glucose-regulating medications) within 5 years prior to and 2 months after enrollment. Participants who self-reported T2D but lacked corroborating EHR data were categorized separately (“only self-reported T2D”). After identifying participants with T2D, we identified participants without T2D based on normal HbA1c and plasma glucose. Participants who self-reported the absence of T2D but lacked corroborating EHR data were categorized separately (“only self-reported no T2D”). Using manual chart reviews (gold standard), we calculated the positive and negative predictive values (NPV) to identify T2D. Results: Of 57,000 participants, the algorithm classified participants as having T2D (n = 6,238), no T2D (n = 38,883), “only self-reported T2D” (n = 757), and “only self-reported no-T2D” (n = 9,759). The algorithm had a high positive predictive value (96.0% [91.5%−98.5%]), NPV (100% [98.0%−100%]), and accuracy (99.5% [98.3%−99.8%]). Participant age (median [range]) ranged from 52 (18-98) years (only self-reported T2D) to 67 (19-99) years (T2D) (P < 0.0001), and the proportion of women ranged from 45.3% (T2D) to 69.6% (only self-reported no T2D) (P < 0.0001). Most participants were of the White race (84.0%−92.7%) and non-Hispanic ethnicity (97.6%−98.6%). Conclusions: In this study, we developed an algorithm to accurately identify participants with and without T2D, which may be generalizable to cohorts with linked EHR data.
AB - Aims: Identifying participants with type 2 diabetes (T2D) based only on electronic health record (EHR) or self-reported data has limited accuracy. Therefore, the objective of the study was to develop an algorithm using EHR and self-reported data to identify participants with and without T2D. Methods: We included participants enrolled in the Mayo Clinic Biobank. At enrollment, participants completed a baseline questionnaire on health conditions, including T2D, and provided access to their EHR data. T2D status was based on self-report and EHR data (International Classification of Diseases codes, hemoglobin A1c [HbA1c], plasma glucose, and glucose-regulating medications) within 5 years prior to and 2 months after enrollment. Participants who self-reported T2D but lacked corroborating EHR data were categorized separately (“only self-reported T2D”). After identifying participants with T2D, we identified participants without T2D based on normal HbA1c and plasma glucose. Participants who self-reported the absence of T2D but lacked corroborating EHR data were categorized separately (“only self-reported no T2D”). Using manual chart reviews (gold standard), we calculated the positive and negative predictive values (NPV) to identify T2D. Results: Of 57,000 participants, the algorithm classified participants as having T2D (n = 6,238), no T2D (n = 38,883), “only self-reported T2D” (n = 757), and “only self-reported no-T2D” (n = 9,759). The algorithm had a high positive predictive value (96.0% [91.5%−98.5%]), NPV (100% [98.0%−100%]), and accuracy (99.5% [98.3%−99.8%]). Participant age (median [range]) ranged from 52 (18-98) years (only self-reported T2D) to 67 (19-99) years (T2D) (P < 0.0001), and the proportion of women ranged from 45.3% (T2D) to 69.6% (only self-reported no T2D) (P < 0.0001). Most participants were of the White race (84.0%−92.7%) and non-Hispanic ethnicity (97.6%−98.6%). Conclusions: In this study, we developed an algorithm to accurately identify participants with and without T2D, which may be generalizable to cohorts with linked EHR data.
KW - algorithm
KW - biobank
KW - diabetes
KW - electronic health record
KW - self-report
UR - http://www.scopus.com/inward/record.url?scp=105002419504&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=105002419504&partnerID=8YFLogxK
U2 - 10.1089/met.2024.0133
DO - 10.1089/met.2024.0133
M3 - Article
AN - SCOPUS:105002419504
SN - 1540-4196
VL - 23
SP - 186
EP - 192
JO - Metabolic Syndrome and Related Disorders
JF - Metabolic Syndrome and Related Disorders
IS - 4
ER -