TY - JOUR
T1 - Electronic medical records for clinical research
T2 - Application to the identification of heart failure
AU - Pakhomov, Serguei
AU - Weston, Susan A.
AU - Jacobsen, Steven J.
AU - Chute, Christopher G.
AU - Meverden, Ryan
AU - Roger, Veronique L.
N1 - Copyright:
Copyright 2008 Elsevier B.V., All rights reserved.
PY - 2007/6
Y1 - 2007/6
N2 - Objective: To identify patients with heart failure (HF) by using language contained in the electronic medical record (EMR). Methods: We validated 2 methods of identifying HF through the EMR, which offers transcription of clinical notes within 24 hours or less of the encounter. The first method was natural language processing (NLP) of the EMR text. The second method was predictive modeling based on machine learning, using the text of clinical reports. Natural language processing was compared with both manual record review and billing records. Predictive modeling was compared with manual record review. Results: Natural language processing identified 2904 HF cases; billing records independently identified 1684 HF cases, 252 (15%) of them not identified by NLP. Review of a random sample of these 252 cases did not identify HF, yielding 100% sensitivity (95% confidence interval [CI] = 86, 100) and 97.8% specificity (95% CI = 97.7, 97.9) for NLP. Manual review confirmed 1107 of the 2904 cases identified by NLP, yielding a positive predictive value (PPV) of 38% (95% CI = 36, 40). Predictive modeling yielded a PPV of 82% (95% CI = 73,93), 56% sensitivity (95% CI = 46, 67), and 96% specificity (95% CI = 94, 99). Conclusions: The EMR can be used to identify HF via 2 complementary approaches. Natural language processing may be more suitable for studies requiring highest sensitivity, whereas predictive modeling may be more suitable for studies requiring higher PPV.
AB - Objective: To identify patients with heart failure (HF) by using language contained in the electronic medical record (EMR). Methods: We validated 2 methods of identifying HF through the EMR, which offers transcription of clinical notes within 24 hours or less of the encounter. The first method was natural language processing (NLP) of the EMR text. The second method was predictive modeling based on machine learning, using the text of clinical reports. Natural language processing was compared with both manual record review and billing records. Predictive modeling was compared with manual record review. Results: Natural language processing identified 2904 HF cases; billing records independently identified 1684 HF cases, 252 (15%) of them not identified by NLP. Review of a random sample of these 252 cases did not identify HF, yielding 100% sensitivity (95% confidence interval [CI] = 86, 100) and 97.8% specificity (95% CI = 97.7, 97.9) for NLP. Manual review confirmed 1107 of the 2904 cases identified by NLP, yielding a positive predictive value (PPV) of 38% (95% CI = 36, 40). Predictive modeling yielded a PPV of 82% (95% CI = 73,93), 56% sensitivity (95% CI = 46, 67), and 96% specificity (95% CI = 94, 99). Conclusions: The EMR can be used to identify HF via 2 complementary approaches. Natural language processing may be more suitable for studies requiring highest sensitivity, whereas predictive modeling may be more suitable for studies requiring higher PPV.
UR - http://www.scopus.com/inward/record.url?scp=34250895016&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34250895016&partnerID=8YFLogxK
M3 - Article
C2 - 17567225
AN - SCOPUS:34250895016
SN - 1088-0224
VL - 13
SP - 281
EP - 288
JO - American Journal of Managed Care
JF - American Journal of Managed Care
IS - 6 I
ER -