Comparing the number and relevance of false activations between 2 artificial intelligence computer-aided detection systems: the NOISE study

Marco Spadaccini; Cesare Hassan; Ludovico Alfarone; Leonardo Da Rio; Roberta Maselli; Silvia Carrara; Piera Alessia Galtieri; Gaia Pellegatta; Alessandro Fugazza; Glenn Koleth; James Emmanuel; Andrea Anderloni; Yuichi Mori; Michael B. Wallace; Prateek Sharma; Alessandro Repici

doi:10.1016/j.gie.2021.12.031

Comparing the number and relevance of false activations between 2 artificial intelligence computer-aided detection systems: the NOISE study

Marco Spadaccini, Cesare Hassan, Ludovico Alfarone, Leonardo Da Rio, Roberta Maselli, Silvia Carrara, Piera Alessia Galtieri, Gaia Pellegatta, Alessandro Fugazza, Glenn Koleth, James Emmanuel, Andrea Anderloni, Yuichi Mori, Michael B. Wallace, Prateek Sharma, Alessandro Repici

Gastroenterology and Hepatology

Research output: Contribution to journal › Article › peer-review

Abstract

Background and Aims: Artificial intelligence has been shown to be effective in polyp detection, and multiple computer-aided detection (CADe) systems have been developed. False-positive (FP) activation emerged as a possible way to benchmark CADe performance in clinical practice. The aim of this study was to validate a previously developed classification of FPs comparing the performances of different brands of approved CADe systems. Methods: We compared 2 different consecutive video libraries (40 video per arm) collected at Humanitas Research Hospital with 2 different CADe system brands (CADe A and CADe B). For each video, the number of CADe false activations, cause, and time spent by the endoscopist to examine the area erroneously highlighted were reported. The FP activations were classified according to the previously developed classification of FPs (the NOISE classification) according to their cause and relevance. Results: In CADe A 1021 FP activations were registered across the 40 videos (25.5 ± 12.2 FPs per colonoscopy), whereas in CADe B 1028 were identified (25.7 ± 13.2 FPs per colonoscopy; P = .53). Among them, 22.9 ± 9.9 (89.8% in CADe A) and 22.1 ± 10.0 (86.0% in CADe B) were because of artifacts from the bowel wall. Conversely, 2.6 ± 1.9 (10.2% in CADe A) and 3.5 ± 2.1 (14% in CADe B) were caused by bowel content (P = .45). Within CADe A each false activation required .2 ± .9 seconds, with 1.6 ± 1.0 FPs (6.3%) requiring additional time for endoscopic assessment. Comparable results were reported within CADe B with .2 ± .8 seconds spent per false activation and 1.8 ± 1.2 FPs per colonoscopy requiring additional inspection. Conclusions: The use of a standardized nomenclature provided comparable results with either of the 2 recently approved CADe systems. (Clinical trial registration number: NCT04399590.)

Original language	English (US)
Pages (from-to)	975-981.e1
Journal	Gastrointestinal endoscopy
Volume	95
Issue number	5
DOIs	https://doi.org/10.1016/j.gie.2021.12.031
State	Published - May 2022

ASJC Scopus subject areas

Radiology Nuclear Medicine and imaging
Gastroenterology

Access to Document

10.1016/j.gie.2021.12.031

Cite this

Spadaccini, M., Hassan, C., Alfarone, L., Da Rio, L., Maselli, R., Carrara, S., Galtieri, P. A., Pellegatta, G., Fugazza, A., Koleth, G., Emmanuel, J., Anderloni, A., Mori, Y., Wallace, M. B., Sharma, P., & Repici, A. (2022). Comparing the number and relevance of false activations between 2 artificial intelligence computer-aided detection systems: the NOISE study. Gastrointestinal endoscopy, 95(5), 975-981.e1. https://doi.org/10.1016/j.gie.2021.12.031

Spadaccini, M, Hassan, C, Alfarone, L, Da Rio, L, Maselli, R, Carrara, S, Galtieri, PA, Pellegatta, G, Fugazza, A, Koleth, G, Emmanuel, J, Anderloni, A, Mori, Y, Wallace, MB, Sharma, P & Repici, A 2022, 'Comparing the number and relevance of false activations between 2 artificial intelligence computer-aided detection systems: the NOISE study', Gastrointestinal endoscopy, vol. 95, no. 5, pp. 975-981.e1. https://doi.org/10.1016/j.gie.2021.12.031

@article{ac0991b376184df9bb32c71ecc34c9ff,

title = "Comparing the number and relevance of false activations between 2 artificial intelligence computer-aided detection systems: the NOISE study",

abstract = "Background and Aims: Artificial intelligence has been shown to be effective in polyp detection, and multiple computer-aided detection (CADe) systems have been developed. False-positive (FP) activation emerged as a possible way to benchmark CADe performance in clinical practice. The aim of this study was to validate a previously developed classification of FPs comparing the performances of different brands of approved CADe systems. Methods: We compared 2 different consecutive video libraries (40 video per arm) collected at Humanitas Research Hospital with 2 different CADe system brands (CADe A and CADe B). For each video, the number of CADe false activations, cause, and time spent by the endoscopist to examine the area erroneously highlighted were reported. The FP activations were classified according to the previously developed classification of FPs (the NOISE classification) according to their cause and relevance. Results: In CADe A 1021 FP activations were registered across the 40 videos (25.5 ± 12.2 FPs per colonoscopy), whereas in CADe B 1028 were identified (25.7 ± 13.2 FPs per colonoscopy; P = .53). Among them, 22.9 ± 9.9 (89.8% in CADe A) and 22.1 ± 10.0 (86.0% in CADe B) were because of artifacts from the bowel wall. Conversely, 2.6 ± 1.9 (10.2% in CADe A) and 3.5 ± 2.1 (14% in CADe B) were caused by bowel content (P = .45). Within CADe A each false activation required .2 ± .9 seconds, with 1.6 ± 1.0 FPs (6.3%) requiring additional time for endoscopic assessment. Comparable results were reported within CADe B with .2 ± .8 seconds spent per false activation and 1.8 ± 1.2 FPs per colonoscopy requiring additional inspection. Conclusions: The use of a standardized nomenclature provided comparable results with either of the 2 recently approved CADe systems. (Clinical trial registration number: NCT04399590.)",

author = "Marco Spadaccini and Cesare Hassan and Ludovico Alfarone and {Da Rio}, Leonardo and Roberta Maselli and Silvia Carrara and Galtieri, {Piera Alessia} and Gaia Pellegatta and Alessandro Fugazza and Glenn Koleth and James Emmanuel and Andrea Anderloni and Yuichi Mori and Wallace, {Michael B.} and Prateek Sharma and Alessandro Repici",

note = "Publisher Copyright: {\textcopyright} 2022",

year = "2022",

month = may,

doi = "10.1016/j.gie.2021.12.031",

language = "English (US)",

volume = "95",

pages = "975--981.e1",

journal = "Gastrointestinal endoscopy",

issn = "0016-5107",

publisher = "Mosby Inc.",

number = "5",

}

TY - JOUR

T1 - Comparing the number and relevance of false activations between 2 artificial intelligence computer-aided detection systems

T2 - the NOISE study

AU - Spadaccini, Marco

AU - Hassan, Cesare

AU - Alfarone, Ludovico

AU - Da Rio, Leonardo

AU - Maselli, Roberta

AU - Carrara, Silvia

AU - Galtieri, Piera Alessia

AU - Pellegatta, Gaia

AU - Fugazza, Alessandro

AU - Koleth, Glenn

AU - Emmanuel, James

AU - Anderloni, Andrea

AU - Mori, Yuichi

AU - Wallace, Michael B.

AU - Sharma, Prateek

AU - Repici, Alessandro

PY - 2022/5

Y1 - 2022/5

N2 - Background and Aims: Artificial intelligence has been shown to be effective in polyp detection, and multiple computer-aided detection (CADe) systems have been developed. False-positive (FP) activation emerged as a possible way to benchmark CADe performance in clinical practice. The aim of this study was to validate a previously developed classification of FPs comparing the performances of different brands of approved CADe systems. Methods: We compared 2 different consecutive video libraries (40 video per arm) collected at Humanitas Research Hospital with 2 different CADe system brands (CADe A and CADe B). For each video, the number of CADe false activations, cause, and time spent by the endoscopist to examine the area erroneously highlighted were reported. The FP activations were classified according to the previously developed classification of FPs (the NOISE classification) according to their cause and relevance. Results: In CADe A 1021 FP activations were registered across the 40 videos (25.5 ± 12.2 FPs per colonoscopy), whereas in CADe B 1028 were identified (25.7 ± 13.2 FPs per colonoscopy; P = .53). Among them, 22.9 ± 9.9 (89.8% in CADe A) and 22.1 ± 10.0 (86.0% in CADe B) were because of artifacts from the bowel wall. Conversely, 2.6 ± 1.9 (10.2% in CADe A) and 3.5 ± 2.1 (14% in CADe B) were caused by bowel content (P = .45). Within CADe A each false activation required .2 ± .9 seconds, with 1.6 ± 1.0 FPs (6.3%) requiring additional time for endoscopic assessment. Comparable results were reported within CADe B with .2 ± .8 seconds spent per false activation and 1.8 ± 1.2 FPs per colonoscopy requiring additional inspection. Conclusions: The use of a standardized nomenclature provided comparable results with either of the 2 recently approved CADe systems. (Clinical trial registration number: NCT04399590.)

AB - Background and Aims: Artificial intelligence has been shown to be effective in polyp detection, and multiple computer-aided detection (CADe) systems have been developed. False-positive (FP) activation emerged as a possible way to benchmark CADe performance in clinical practice. The aim of this study was to validate a previously developed classification of FPs comparing the performances of different brands of approved CADe systems. Methods: We compared 2 different consecutive video libraries (40 video per arm) collected at Humanitas Research Hospital with 2 different CADe system brands (CADe A and CADe B). For each video, the number of CADe false activations, cause, and time spent by the endoscopist to examine the area erroneously highlighted were reported. The FP activations were classified according to the previously developed classification of FPs (the NOISE classification) according to their cause and relevance. Results: In CADe A 1021 FP activations were registered across the 40 videos (25.5 ± 12.2 FPs per colonoscopy), whereas in CADe B 1028 were identified (25.7 ± 13.2 FPs per colonoscopy; P = .53). Among them, 22.9 ± 9.9 (89.8% in CADe A) and 22.1 ± 10.0 (86.0% in CADe B) were because of artifacts from the bowel wall. Conversely, 2.6 ± 1.9 (10.2% in CADe A) and 3.5 ± 2.1 (14% in CADe B) were caused by bowel content (P = .45). Within CADe A each false activation required .2 ± .9 seconds, with 1.6 ± 1.0 FPs (6.3%) requiring additional time for endoscopic assessment. Comparable results were reported within CADe B with .2 ± .8 seconds spent per false activation and 1.8 ± 1.2 FPs per colonoscopy requiring additional inspection. Conclusions: The use of a standardized nomenclature provided comparable results with either of the 2 recently approved CADe systems. (Clinical trial registration number: NCT04399590.)

UR - http://www.scopus.com/inward/record.url?scp=85126548739&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85126548739&partnerID=8YFLogxK

U2 - 10.1016/j.gie.2021.12.031

DO - 10.1016/j.gie.2021.12.031

M3 - Article

C2 - 34995639

AN - SCOPUS:85126548739

SN - 0016-5107

VL - 95

SP - 975-981.e1

JO - Gastrointestinal endoscopy

JF - Gastrointestinal endoscopy

IS - 5

ER -

Comparing the number and relevance of false activations between 2 artificial intelligence computer-aided detection systems: the NOISE study

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this