A weakly supervised model for the automated detection of adverse events using clinical notes

Josh Sanyal, Daniel Rubin, Imon Banerjee

Research output: Contribution to journalArticlepeer-review


With clinical trials unable to detect all potential adverse reactions to drugs and medical devices prior to their release into the market, accurate post-market surveillance is critical to ensure their safety and efficacy. Electronic health records (EHR) contain rich observational patient data, making them a valuable source to actively monitor the safety of drugs and devices. While structured EHR data and spontaneous reporting systems often underreport the complexities of patient encounters and outcomes, free-text clinical notes offer greater detail about a patient's status. Previous studies have proposed machine learning methods to detect adverse events from clinical notes, but suffer from manually extracted features, reliance on costly hand-labeled data, and lack of validation on external datasets. To address these challenges, we develop a weakly-supervised machine learning framework for adverse event detection from unstructured clinical notes and evaluate it on insulin pump failure as a test case. Our model accurately detected cases of pump failure with 0.842 PR AUC on the holdout test set and 0.815 PR AUC when validated on an external dataset. Our approach allowed us to leverage a large dataset with far less hand-labeled data and can be easily transferred to additional adverse events for scalable post-market surveillance.

Original languageEnglish (US)
Article number103969
JournalJournal of Biomedical Informatics
StatePublished - Feb 2022


  • Insulin pump failure
  • Natural language processing
  • Scalable post-market surveillance
  • Unstructured clinical notes

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics


Dive into the research topics of 'A weakly supervised model for the automated detection of adverse events using clinical notes'. Together they form a unique fingerprint.

Cite this