Artificial intelligence for the assessment of bowel preparation

Ji Young Lee, Audrey H. Calderwood, William Karnes, James Requa, Brian C. Jacobson, Michael B. Wallace

Research output: Contribution to journalArticlepeer-review


Background and Aims: A reliable assessment of bowel preparation is important to ensure high-quality colonoscopy. Current bowel preparation scoring systems are limited by interobserver variability. This study aimed to demonstrate objective assessment of bowel preparation adequacy using an artificial intelligence (AI)/convolutional neural network (CNN) algorithm developed from colonoscopy videos. Methods: Two CNNs were developed using a training set of 73,304 images from 200 colonoscopies. First, a binary CNN was developed and trained to distinguish video frames that were appropriate versus inappropriate for scoring with the Boston Bowel Preparation Scale (BBPS). A second multiclass CNN was developed and trained on 26,950 appropriate frames that were expertly annotated with BBPS segment scores (0-3). We validated the algorithm using 252 10-second video clips that were assigned BBPS segment scores by 2 experts. The algorithm provided mean BBPS scores based on the algorithm (AI-BBPS) by calculating mean BBPS based on each frame's scoring. We maximized the algorithm's performance by choosing a dichotomized AI-BBPS score that closely matched dichotomized BBPS scores (ie, adequate vs inadequate). We tested the mean BBPS score based on the algorithm AI-BBPS against human rating using 30 independent 10-second video clips (test set 1) and 10 full withdrawal colonoscopy videos (test set 2). Results: In the validation set, the algorithm demonstrated an area under the curve of .918 and accuracy of 85.3% for detection of inadequate bowel cleanliness. In test set 1, sensitivity for inadequate bowel preparation was 100% and agreement between raters and AI was 76.7% to 83.3%. In test set 2, sensitivity for inadequate bowel preparation for each segment was 100% and agreement between raters and AI was 68.9% to 89.7%. Agreement between raters alone versus raters and AI were similar (κ = .694 and .649, respectively). Conclusions: The algorithm assessment of bowel cleanliness as measured with the BBPS showed good performance and agreement with experts including full withdrawal colonoscopies.

Original languageEnglish (US)
Pages (from-to)512-518.e1
JournalGastrointestinal endoscopy
Issue number3
StatePublished - Mar 2022

ASJC Scopus subject areas

  • Radiology Nuclear Medicine and imaging
  • Gastroenterology


Dive into the research topics of 'Artificial intelligence for the assessment of bowel preparation'. Together they form a unique fingerprint.

Cite this