Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment

Imon Banerjee, Kevin Li, Martin Seneviratne, Michelle Ferrari, Tina Seto, James D. Brooks, Daniel L. Rubin, Tina Hernandez-Boussard

Research output: Contribution to journalArticlepeer-review


Background: The population-based assessment of patient-centered outcomes (PCOs) has been limited by the efficient and accurate collection of these data. Natural language processing (NLP) pipelines can determine whether a clinical note within an electronic medical record contains evidence on these data. We present and demonstrate the accuracy of an NLP pipeline that targets to assess the presence, absence, or risk discussion of two important PCOs following prostate cancer treatment: urinary incontinence (UI) and bowel dysfunction (BD). Methods: We propose a weakly supervised NLP approach which annotates electronic medical record clinical notes without requiring manual chart review. A weighted function of neural word embedding was used to create a sentence-level vector representation of relevant expressions extracted from the clinical notes. Sentence vectors were used as input for a multinomial logistic model, with output being either presence, absence or risk discussion of UI/BD. The classifier was trained based on automated sentence annotation depending only on domain-specific dictionaries (weak supervision). Results: The model achieved an average F1 score of 0.86 for the sentence-level, three-tier classification task (presence/absence/risk) in both UI and BD. The model also outperformed a pre-existing rule-based model for note-level annotation of UI with significant margin. Conclusions: We demonstrate a machine learning method to categorize clinical notes based on important PCOs that trains a classifier on sentence vector representations labeled with a domain-specific dictionary, which eliminates the need for manual engineering of linguistic rules or manual chart review for extracting the PCOs. The weakly supervised NLP pipeline showed promising sensitivity and specificity for identifying important PCOs in unstructured clinical text notes compared to rule-based algorithms. Trial registration: This is a chart review study and approved by Institutional Review Board (IRB).

Original languageEnglish (US)
Article numberooy057
Pages (from-to)150-159
Number of pages10
JournalJAMIA Open
Issue number1
StatePublished - Jan 4 2019


  • natural language processing
  • neural word embedding
  • patient-centered outcomes
  • prostate cancer
  • text mining

ASJC Scopus subject areas

  • Health Informatics


Dive into the research topics of 'Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment'. Together they form a unique fingerprint.

Cite this