A study of transportability of an existing smoking status detection module across institutions.

Mei Liu, Anushi Shah, Min Jiang, Neeraja B. Peterson, Q. Dai, Melinda C. Aldrich, Qingxia Chen, Erica A. Bowton, Hongfang Liu, Joshua C. Denny, Hua Xu

Research output: Contribution to journalArticlepeer-review

26 Scopus citations


Electronic Medical Records (EMRs) are valuable resources for clinical observational studies. Smoking status of a patient is one of the key factors for many diseases, but it is often embedded in narrative text. Natural language processing (NLP) systems have been developed for this specific task, such as the smoking status detection module in the clinical Text Analysis and Knowledge Extraction System (cTAKES). This study examined transportability of the smoking module in cTAKES on the Vanderbilt University Hospital's EMR data. Our evaluation demonstrated that modest effort of change is necessary to achieve desirable performance. We modified the system by filtering notes, annotating new data for training the machine learning classifier, and adding rules to the rule-based classifiers. Our results showed that the customized module achieved significantly higher F-measures at all levels of classification (i.e., sentence, document, patient) compared to the direct application of the cTAKES module to the Vanderbilt data.

Original languageEnglish (US)
Pages (from-to)577-586
Number of pages10
JournalUnknown Journal
StatePublished - 2012

ASJC Scopus subject areas

  • Medicine(all)


Dive into the research topics of 'A study of transportability of an existing smoking status detection module across institutions.'. Together they form a unique fingerprint.

Cite this