Multi-center validation of an artificial intelligence system for detection of COVID-19 on chest radiographs in symptomatic patients

Michael D. Kuo; Keith W.H. Chiu; David S. Wang; Anna Rita Larici; Dmytro Poplavskiy; Adele Valentini; Alessandro Napoli; Andrea Borghesi; Guido Ligabue; Xin Hao B. Fang; Hing Ki C. Wong; Sailong Zhang; John R. Hunter; Abeer Mousa; Amato Infante; Lorenzo Elia; Salvatore Golemi; Leung Ho P. Yu; Christopher K.M. Hui; Bradley J. Erickson

doi:10.1007/s00330-022-08969-z

Multi-center validation of an artificial intelligence system for detection of COVID-19 on chest radiographs in symptomatic patients

Michael D. Kuo, Keith W.H. Chiu, David S. Wang, Anna Rita Larici, Dmytro Poplavskiy, Adele Valentini, Alessandro Napoli, Andrea Borghesi, Guido Ligabue, Xin Hao B. Fang, Hing Ki C. Wong, Sailong Zhang, John R. Hunter, Abeer Mousa, Amato Infante, Lorenzo Elia, Salvatore Golemi, Leung Ho P. Yu, Christopher K.M. Hui, Bradley J. Erickson

Radiology

Research output: Contribution to journal › Article › peer-review

Abstract

Objectives: While chest radiograph (CXR) is the first-line imaging investigation in patients with respiratory symptoms, differentiating COVID-19 from other respiratory infections on CXR remains challenging. We developed and validated an AI system for COVID-19 detection on presenting CXR. Methods: A deep learning model (RadGenX), trained on 168,850 CXRs, was validated on a large international test set of presenting CXRs of symptomatic patients from 9 study sites (US, Italy, and Hong Kong SAR) and 2 public datasets from the US and Europe. Performance was measured by area under the receiver operator characteristic curve (AUC). Bootstrapped simulations were performed to assess performance across a range of potential COVID-19 disease prevalence values (3.33 to 33.3%). Comparison against international radiologists was performed on an independent test set of 852 cases. Results: RadGenX achieved an AUC of 0.89 on 4-fold cross-validation and an AUC of 0.79 (95%CI 0.78–0.80) on an independent test cohort of 5,894 patients. Delong’s test showed statistical differences in model performance across patients from different regions (p < 0.01), disease severity (p < 0.001), gender (p < 0.001), and age (p = 0.03). Prevalence simulations showed the negative predictive value increases from 86.1% at 33.3% prevalence, to greater than 98.5% at any prevalence below 4.5%. Compared with radiologists, McNemar’s test showed the model has higher sensitivity (p < 0.001) but lower specificity (p < 0.001). Conclusion: An AI model that predicts COVID-19 infection on CXR in symptomatic patients was validated on a large international cohort providing valuable context on testing and performance expectations for AI systems that perform COVID-19 prediction on CXR. Key Points: • An AI model developed using CXRs to detect COVID-19 was validated in a large multi-center cohort of 5,894 patients from 9 prospectively recruited sites and 2 public datasets. • Differences in AI model performance were seen across region, disease severity, gender, and age. • Prevalence simulations on the international test set demonstrate the model’s NPV is greater than 98.5% at any prevalence below 4.5%.

Original language	English (US)
Pages (from-to)	23-33
Number of pages	11
Journal	European radiology
Volume	33
Issue number	1
DOIs	https://doi.org/10.1007/s00330-022-08969-z
State	Published - Jan 2023

Keywords

Artificial intelligence
COVID-19
Public health
Radiology
Thoracic

ASJC Scopus subject areas

Radiology Nuclear Medicine and imaging

Access to Document

10.1007/s00330-022-08969-z

Cite this

Kuo, M. D., Chiu, K. W. H., Wang, D. S., Larici, A. R., Poplavskiy, D., Valentini, A., Napoli, A., Borghesi, A., Ligabue, G., Fang, X. H. B., Wong, H. K. C., Zhang, S., Hunter, J. R., Mousa, A., Infante, A., Elia, L., Golemi, S., Yu, L. H. P., Hui, C. K. M., & Erickson, B. J. (2023). Multi-center validation of an artificial intelligence system for detection of COVID-19 on chest radiographs in symptomatic patients. European radiology, 33(1), 23-33. https://doi.org/10.1007/s00330-022-08969-z

Kuo, MD, Chiu, KWH, Wang, DS, Larici, AR, Poplavskiy, D, Valentini, A, Napoli, A, Borghesi, A, Ligabue, G, Fang, XHB, Wong, HKC, Zhang, S, Hunter, JR, Mousa, A, Infante, A, Elia, L, Golemi, S, Yu, LHP, Hui, CKM & Erickson, BJ 2023, 'Multi-center validation of an artificial intelligence system for detection of COVID-19 on chest radiographs in symptomatic patients', European radiology, vol. 33, no. 1, pp. 23-33. https://doi.org/10.1007/s00330-022-08969-z

@article{e5348fc8a1b8489aa48421197e710d9b,

title = "Multi-center validation of an artificial intelligence system for detection of COVID-19 on chest radiographs in symptomatic patients",

abstract = "Objectives: While chest radiograph (CXR) is the first-line imaging investigation in patients with respiratory symptoms, differentiating COVID-19 from other respiratory infections on CXR remains challenging. We developed and validated an AI system for COVID-19 detection on presenting CXR. Methods: A deep learning model (RadGenX), trained on 168,850 CXRs, was validated on a large international test set of presenting CXRs of symptomatic patients from 9 study sites (US, Italy, and Hong Kong SAR) and 2 public datasets from the US and Europe. Performance was measured by area under the receiver operator characteristic curve (AUC). Bootstrapped simulations were performed to assess performance across a range of potential COVID-19 disease prevalence values (3.33 to 33.3%). Comparison against international radiologists was performed on an independent test set of 852 cases. Results: RadGenX achieved an AUC of 0.89 on 4-fold cross-validation and an AUC of 0.79 (95%CI 0.78–0.80) on an independent test cohort of 5,894 patients. Delong{\textquoteright}s test showed statistical differences in model performance across patients from different regions (p < 0.01), disease severity (p < 0.001), gender (p < 0.001), and age (p = 0.03). Prevalence simulations showed the negative predictive value increases from 86.1% at 33.3% prevalence, to greater than 98.5% at any prevalence below 4.5%. Compared with radiologists, McNemar{\textquoteright}s test showed the model has higher sensitivity (p < 0.001) but lower specificity (p < 0.001). Conclusion: An AI model that predicts COVID-19 infection on CXR in symptomatic patients was validated on a large international cohort providing valuable context on testing and performance expectations for AI systems that perform COVID-19 prediction on CXR. Key Points: • An AI model developed using CXRs to detect COVID-19 was validated in a large multi-center cohort of 5,894 patients from 9 prospectively recruited sites and 2 public datasets. • Differences in AI model performance were seen across region, disease severity, gender, and age. • Prevalence simulations on the international test set demonstrate the model{\textquoteright}s NPV is greater than 98.5% at any prevalence below 4.5%.",

keywords = "Artificial intelligence, COVID-19, Public health, Radiology, Thoracic",

author = "Kuo, {Michael D.} and Chiu, {Keith W.H.} and Wang, {David S.} and Larici, {Anna Rita} and Dmytro Poplavskiy and Adele Valentini and Alessandro Napoli and Andrea Borghesi and Guido Ligabue and Fang, {Xin Hao B.} and Wong, {Hing Ki C.} and Sailong Zhang and Hunter, {John R.} and Abeer Mousa and Amato Infante and Lorenzo Elia and Salvatore Golemi and Yu, {Leung Ho P.} and Hui, {Christopher K.M.} and Erickson, {Bradley J.}",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s), under exclusive licence to European Society of Radiology.",

year = "2023",

month = jan,

doi = "10.1007/s00330-022-08969-z",

language = "English (US)",

volume = "33",

pages = "23--33",

journal = "European radiology",

issn = "0938-7994",

publisher = "Springer Verlag",

number = "1",

}

TY - JOUR

T1 - Multi-center validation of an artificial intelligence system for detection of COVID-19 on chest radiographs in symptomatic patients

AU - Kuo, Michael D.

AU - Chiu, Keith W.H.

AU - Wang, David S.

AU - Larici, Anna Rita

AU - Poplavskiy, Dmytro

AU - Valentini, Adele

AU - Napoli, Alessandro

AU - Borghesi, Andrea

AU - Ligabue, Guido

AU - Fang, Xin Hao B.

AU - Wong, Hing Ki C.

AU - Zhang, Sailong

AU - Hunter, John R.

AU - Mousa, Abeer

AU - Infante, Amato

AU - Elia, Lorenzo

AU - Golemi, Salvatore

AU - Yu, Leung Ho P.

AU - Hui, Christopher K.M.

AU - Erickson, Bradley J.

PY - 2023/1

Y1 - 2023/1

N2 - Objectives: While chest radiograph (CXR) is the first-line imaging investigation in patients with respiratory symptoms, differentiating COVID-19 from other respiratory infections on CXR remains challenging. We developed and validated an AI system for COVID-19 detection on presenting CXR. Methods: A deep learning model (RadGenX), trained on 168,850 CXRs, was validated on a large international test set of presenting CXRs of symptomatic patients from 9 study sites (US, Italy, and Hong Kong SAR) and 2 public datasets from the US and Europe. Performance was measured by area under the receiver operator characteristic curve (AUC). Bootstrapped simulations were performed to assess performance across a range of potential COVID-19 disease prevalence values (3.33 to 33.3%). Comparison against international radiologists was performed on an independent test set of 852 cases. Results: RadGenX achieved an AUC of 0.89 on 4-fold cross-validation and an AUC of 0.79 (95%CI 0.78–0.80) on an independent test cohort of 5,894 patients. Delong’s test showed statistical differences in model performance across patients from different regions (p < 0.01), disease severity (p < 0.001), gender (p < 0.001), and age (p = 0.03). Prevalence simulations showed the negative predictive value increases from 86.1% at 33.3% prevalence, to greater than 98.5% at any prevalence below 4.5%. Compared with radiologists, McNemar’s test showed the model has higher sensitivity (p < 0.001) but lower specificity (p < 0.001). Conclusion: An AI model that predicts COVID-19 infection on CXR in symptomatic patients was validated on a large international cohort providing valuable context on testing and performance expectations for AI systems that perform COVID-19 prediction on CXR. Key Points: • An AI model developed using CXRs to detect COVID-19 was validated in a large multi-center cohort of 5,894 patients from 9 prospectively recruited sites and 2 public datasets. • Differences in AI model performance were seen across region, disease severity, gender, and age. • Prevalence simulations on the international test set demonstrate the model’s NPV is greater than 98.5% at any prevalence below 4.5%.

AB - Objectives: While chest radiograph (CXR) is the first-line imaging investigation in patients with respiratory symptoms, differentiating COVID-19 from other respiratory infections on CXR remains challenging. We developed and validated an AI system for COVID-19 detection on presenting CXR. Methods: A deep learning model (RadGenX), trained on 168,850 CXRs, was validated on a large international test set of presenting CXRs of symptomatic patients from 9 study sites (US, Italy, and Hong Kong SAR) and 2 public datasets from the US and Europe. Performance was measured by area under the receiver operator characteristic curve (AUC). Bootstrapped simulations were performed to assess performance across a range of potential COVID-19 disease prevalence values (3.33 to 33.3%). Comparison against international radiologists was performed on an independent test set of 852 cases. Results: RadGenX achieved an AUC of 0.89 on 4-fold cross-validation and an AUC of 0.79 (95%CI 0.78–0.80) on an independent test cohort of 5,894 patients. Delong’s test showed statistical differences in model performance across patients from different regions (p < 0.01), disease severity (p < 0.001), gender (p < 0.001), and age (p = 0.03). Prevalence simulations showed the negative predictive value increases from 86.1% at 33.3% prevalence, to greater than 98.5% at any prevalence below 4.5%. Compared with radiologists, McNemar’s test showed the model has higher sensitivity (p < 0.001) but lower specificity (p < 0.001). Conclusion: An AI model that predicts COVID-19 infection on CXR in symptomatic patients was validated on a large international cohort providing valuable context on testing and performance expectations for AI systems that perform COVID-19 prediction on CXR. Key Points: • An AI model developed using CXRs to detect COVID-19 was validated in a large multi-center cohort of 5,894 patients from 9 prospectively recruited sites and 2 public datasets. • Differences in AI model performance were seen across region, disease severity, gender, and age. • Prevalence simulations on the international test set demonstrate the model’s NPV is greater than 98.5% at any prevalence below 4.5%.

KW - Artificial intelligence

KW - COVID-19

KW - Public health

KW - Radiology

KW - Thoracic

UR - http://www.scopus.com/inward/record.url?scp=85133274530&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85133274530&partnerID=8YFLogxK

U2 - 10.1007/s00330-022-08969-z

DO - 10.1007/s00330-022-08969-z

M3 - Article

C2 - 35779089

AN - SCOPUS:85133274530

SN - 0938-7994

VL - 33

SP - 23

EP - 33

JO - European radiology

JF - European radiology

IS - 1

ER -

Multi-center validation of an artificial intelligence system for detection of COVID-19 on chest radiographs in symptomatic patients

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this