Developing customizable cancer information extraction modules for pathology reports using clamp

Ergin Soysal, Jeremy L. Warner, Jingqi Wang, Min Jiang, Krysten Harvey, Sandeep Kumar Jain, Xiao Dong, Hsing Yi Song, Harish Siddhanamatha, Liwei Wang, Qi Dai, Qingxia Chen, Xianglin Du, Cui Tao, Ping Yang, Joshua Charles Denny, Hongfang Liu, Hua Xu

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Natural language processing (NLP) technologies have been successfully applied to cancer research by enabling automated phenotypic information extraction from narratives in electronic health records (EHRs) such as pathology reports; however, developing customized NLP solutions requires substantial effort. To facilitate the adoption of NLP in cancer research, we have developed a set of customizable modules for extracting comprehensive types of cancer-related information in pathology reports (e.g., tumor size, tumor stage, and biomarkers), by leveraging the existing CLAMP system, which provides user-friendly interfaces for building customized NLP solutions for individual needs. Evaluation using annotated data at Vanderbilt University Medical Center showed that CLAMP-Cancer could extract diverse types of cancer information with good F-measures (0.80-0.98). We then applied CLAMP-Cancer to an information extraction task at Mayo Clinic and showed that we can quickly build a customized NLP system with comparable performance with an existing system at Mayo Clinic. CLAMP-Cancer is freely available for academic use.

Original languageEnglish (US)
Title of host publicationMEDINFO 2019
Subtitle of host publicationHealth and Wellbeing e-Networks for All - Proceedings of the 17th World Congress on Medical and Health Informatics
EditorsBrigitte Seroussi, Lucila Ohno-Machado, Lucila Ohno-Machado, Brigitte Seroussi
PublisherIOS Press
Number of pages5
ISBN (Electronic)9781643680026
StatePublished - Aug 21 2019
Event17th World Congress on Medical and Health Informatics, MEDINFO 2019 - Lyon, France
Duration: Aug 25 2019Aug 30 2019

Publication series

NameStudies in Health Technology and Informatics
ISSN (Print)0926-9630
ISSN (Electronic)1879-8365


Conference17th World Congress on Medical and Health Informatics, MEDINFO 2019


  • Electronic Health Records
  • Information Storage and Retrieval
  • Natural Language Processing

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics
  • Health Information Management


Dive into the research topics of 'Developing customizable cancer information extraction modules for pathology reports using clamp'. Together they form a unique fingerprint.

Cite this