Risk factor detection for heart disease by applying text analytics in electronic medical records

Manabu Torii; Jung wei Fan; Wei li Yang; Theodore Lee; Matthew T. Wiley; Daniel S. Zisook; Yang Huang

doi:10.1016/j.jbi.2015.08.011

Risk factor detection for heart disease by applying text analytics in electronic medical records

Manabu Torii, Jung wei Fan, Wei li Yang, Theodore Lee, Matthew T. Wiley, Daniel S. Zisook, Yang Huang

Digital Health Sciences

Research output: Contribution to journal › Article › peer-review

18 Scopus citations

Abstract

In the United States, about 600,000 people die of heart disease every year. The annual cost of care services, medications, and lost productivity reportedly exceeds 108.9 billion dollars. Effective disease risk assessment is critical to prevention, care, and treatment planning. Recent advancements in text analytics have opened up new possibilities of using the rich information in electronic medical records (EMRs) to identify relevant risk factors. The 2014 i2b2/UTHealth Challenge brought together researchers and practitioners of clinical natural language processing (NLP) to tackle the identification of heart disease risk factors reported in EMRs. We participated in this track and developed an NLP system by leveraging existing tools and resources, both public and proprietary. Our system was a hybrid of several machine-learning and rule-based components. The system achieved an overall F1 score of 0.9185, with a recall of 0.9409 and a precision of 0.8972.

Original language	English (US)
Pages (from-to)	S164-S170
Journal	Journal of Biomedical Informatics
Volume	58
DOIs	https://doi.org/10.1016/j.jbi.2015.08.011
State	Published - Dec 1 2015

Keywords

Medical records
Natural language processing
Risk assessment
Text classification

ASJC Scopus subject areas

Computer Science Applications
Health Informatics

Access to Document

10.1016/j.jbi.2015.08.011

Cite this

@article{d085b7c744624c19b3d40fa5a427d1ef,

title = "Risk factor detection for heart disease by applying text analytics in electronic medical records",

abstract = "In the United States, about 600,000 people die of heart disease every year. The annual cost of care services, medications, and lost productivity reportedly exceeds 108.9 billion dollars. Effective disease risk assessment is critical to prevention, care, and treatment planning. Recent advancements in text analytics have opened up new possibilities of using the rich information in electronic medical records (EMRs) to identify relevant risk factors. The 2014 i2b2/UTHealth Challenge brought together researchers and practitioners of clinical natural language processing (NLP) to tackle the identification of heart disease risk factors reported in EMRs. We participated in this track and developed an NLP system by leveraging existing tools and resources, both public and proprietary. Our system was a hybrid of several machine-learning and rule-based components. The system achieved an overall F1 score of 0.9185, with a recall of 0.9409 and a precision of 0.8972.",

keywords = "Medical records, Natural language processing, Risk assessment, Text classification",

author = "Manabu Torii and Fan, {Jung wei} and Yang, {Wei li} and Theodore Lee and Wiley, {Matthew T.} and Zisook, {Daniel S.} and Yang Huang",

note = "Publisher Copyright: {\textcopyright} 2015 Elsevier Inc.",

year = "2015",

month = dec,

day = "1",

doi = "10.1016/j.jbi.2015.08.011",

language = "English (US)",

volume = "58",

pages = "S164--S170",

journal = "Journal of Biomedical Informatics",

issn = "1532-0464",

publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - Risk factor detection for heart disease by applying text analytics in electronic medical records

AU - Torii, Manabu

AU - Fan, Jung wei

AU - Yang, Wei li

AU - Lee, Theodore

AU - Wiley, Matthew T.

AU - Zisook, Daniel S.

AU - Huang, Yang

PY - 2015/12/1

Y1 - 2015/12/1

N2 - In the United States, about 600,000 people die of heart disease every year. The annual cost of care services, medications, and lost productivity reportedly exceeds 108.9 billion dollars. Effective disease risk assessment is critical to prevention, care, and treatment planning. Recent advancements in text analytics have opened up new possibilities of using the rich information in electronic medical records (EMRs) to identify relevant risk factors. The 2014 i2b2/UTHealth Challenge brought together researchers and practitioners of clinical natural language processing (NLP) to tackle the identification of heart disease risk factors reported in EMRs. We participated in this track and developed an NLP system by leveraging existing tools and resources, both public and proprietary. Our system was a hybrid of several machine-learning and rule-based components. The system achieved an overall F1 score of 0.9185, with a recall of 0.9409 and a precision of 0.8972.

AB - In the United States, about 600,000 people die of heart disease every year. The annual cost of care services, medications, and lost productivity reportedly exceeds 108.9 billion dollars. Effective disease risk assessment is critical to prevention, care, and treatment planning. Recent advancements in text analytics have opened up new possibilities of using the rich information in electronic medical records (EMRs) to identify relevant risk factors. The 2014 i2b2/UTHealth Challenge brought together researchers and practitioners of clinical natural language processing (NLP) to tackle the identification of heart disease risk factors reported in EMRs. We participated in this track and developed an NLP system by leveraging existing tools and resources, both public and proprietary. Our system was a hybrid of several machine-learning and rule-based components. The system achieved an overall F1 score of 0.9185, with a recall of 0.9409 and a precision of 0.8972.

KW - Medical records

KW - Natural language processing

KW - Risk assessment

KW - Text classification

UR - http://www.scopus.com/inward/record.url?scp=84940099418&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84940099418&partnerID=8YFLogxK

U2 - 10.1016/j.jbi.2015.08.011

DO - 10.1016/j.jbi.2015.08.011

M3 - Article

C2 - 26279500

AN - SCOPUS:84940099418

SN - 1532-0464

VL - 58

SP - S164-S170

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

ER -

Risk factor detection for heart disease by applying text analytics in electronic medical records

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this