Serendipity - A Machine-Learning Application for Mining Serendipitous Drug Usage from Social Media

Boshu Ru, Dingcheng Li, Yueqi Hu, Lixia Yao

Research output: Contribution to journalArticlepeer-review

4 Scopus citations


Serendipitous drug usage refers to the unexpected relief of comorbid diseases or symptoms when taking medication for a different known indication. Historically, serendipity has contributed significantly to identifying many new drug indications. If patient-reported serendipitous drug usage in social media could be computationally identified, it could help generate and validate drug-repositioning hypotheses. We investigated deep neural network models for mining serendipitous drug usage from social media. We used the word2vec algorithm to construct word-embedding features from drug reviews posted in a WebMD patient forum. We adapted and redesigned the convolutional neural network, long short-term memory network, and convolutional long short-term memory network by adding contextual information extracted from drug-review posts, information-filtering tools, medical ontology, and medical knowledge. We trained, tuned, and evaluated our models with a gold-standard dataset of 15714 sentences (447 [2.8%] describing serendipitous drug usage). Additionally, we compared our deep neural networks to support vector machine, random forest, and AdaBoost.M1 algorithms. Context information helped to reduce the false-positive rate of deep neural network models. If we used an extremely imbalanced dataset with limited instances of serendipitous drug usage, deep neural network models did not outperform other machine-learning models with n-gram and context features. However, deep neural network models could more effectively use word embedding in feature construction, an advantage that makes them worthy of further investigation. Finally, we implemented natural-language processing and machine-learning methods in a web-based application to help scientists and software developers mine social media for serendipitous drug usage.

Original languageEnglish (US)
Article number8681431
Pages (from-to)324-334
Number of pages11
JournalIEEE Transactions on Nanobioscience
Issue number3
StatePublished - Jul 2019


  • Social media
  • data mining
  • drug discovery
  • drug repurposing
  • health informatics

ASJC Scopus subject areas

  • Bioengineering
  • Electrical and Electronic Engineering
  • Biotechnology
  • Biomedical Engineering
  • Medicine (miscellaneous)
  • Computer Science Applications
  • Pharmaceutical Science


Dive into the research topics of 'Serendipity - A Machine-Learning Application for Mining Serendipitous Drug Usage from Social Media'. Together they form a unique fingerprint.

Cite this