AI recognition of patient race in medical imaging: a modelling study

Judy Wawira Gichoya; Imon Banerjee; Ananth Reddy Bhimireddy; John L. Burns; Leo Anthony Celi; Li Ching Chen; Ramon Correa; Natalie Dullerud; Marzyeh Ghassemi; Shih Cheng Huang; Po Chih Kuo; Matthew P. Lungren; Lyle J. Palmer; Brandon J. Price; Saptarshi Purkayastha; Ayis T. Pyrros; Lauren Oakden-Rayner; Chima Okechukwu; Laleh Seyyed-Kalantari; Hari Trivedi; Ryan Wang; Zachary Zaiman; Haoran Zhang

doi:10.1016/S2589-7500(22)00063-2

AI recognition of patient race in medical imaging: a modelling study

Judy Wawira Gichoya, Imon Banerjee, Ananth Reddy Bhimireddy, John L. Burns, Leo Anthony Celi, Li Ching Chen, Ramon Correa, Natalie Dullerud, Marzyeh Ghassemi, Shih Cheng Huang, Po Chih Kuo, Matthew P. Lungren, Lyle J. Palmer, Brandon J. Price, Saptarshi Purkayastha, Ayis T. Pyrros, Lauren Oakden-Rayner, Chima Okechukwu, Laleh Seyyed-Kalantari, Hari TrivediRyan Wang, Zachary Zaiman, Haoran Zhang

Diagnostic Radiology

Research output: Contribution to journal › Article › peer-review

Abstract

Background: Previous studies in medical imaging have shown disparate abilities of artificial intelligence (AI) to detect a person's race, yet there is no known correlation for race on medical imaging that would be obvious to human experts when interpreting the images. We aimed to conduct a comprehensive evaluation of the ability of AI to recognise a patient's racial identity from medical images. Methods: Using private (Emory CXR, Emory Chest CT, Emory Cervical Spine, and Emory Mammogram) and public (MIMIC-CXR, CheXpert, National Lung Cancer Screening Trial, RSNA Pulmonary Embolism CT, and Digital Hand Atlas) datasets, we evaluated, first, performance quantification of deep learning models in detecting race from medical images, including the ability of these models to generalise to external environments and across multiple imaging modalities. Second, we assessed possible confounding of anatomic and phenotypic population features by assessing the ability of these hypothesised confounders to detect race in isolation using regression models, and by re-evaluating the deep learning models by testing them on datasets stratified by these hypothesised confounding variables. Last, by exploring the effect of image corruptions on model performance, we investigated the underlying mechanism by which AI models can recognise race. Findings: In our study, we show that standard AI deep learning models can be trained to predict race from medical images with high performance across multiple imaging modalities, which was sustained under external validation conditions (x-ray imaging [area under the receiver operating characteristics curve (AUC) range 0·91–0·99], CT chest imaging [0·87–0·96], and mammography [0·81]). We also showed that this detection is not due to proxies or imaging-related surrogate covariates for race (eg, performance of possible confounders: body-mass index [AUC 0·55], disease distribution [0·61], and breast density [0·61]). Finally, we provide evidence to show that the ability of AI deep learning models persisted over all anatomical regions and frequency spectrums of the images, suggesting the efforts to control this behaviour when it is undesirable will be challenging and demand further study. Interpretation: The results from our study emphasise that the ability of AI deep learning models to predict self-reported race is itself not the issue of importance. However, our finding that AI can accurately predict self-reported race, even from corrupted, cropped, and noised medical images, often when clinical experts cannot, creates an enormous risk for all model deployments in medical imaging.

Original language	English (US)
Pages (from-to)	e406-e414
Journal	The Lancet Digital Health
Volume	4
Issue number	6
DOIs	https://doi.org/10.1016/S2589-7500(22)00063-2
State	Published - Jun 2022

ASJC Scopus subject areas

Medicine (miscellaneous)
Health Informatics
Decision Sciences (miscellaneous)
Health Information Management

Access to Document

10.1016/S2589-7500(22)00063-2

Cite this

Gichoya, J. W., Banerjee, I., Bhimireddy, A. R., Burns, J. L., Celi, L. A., Chen, L. C., Correa, R., Dullerud, N., Ghassemi, M., Huang, S. C., Kuo, P. C., Lungren, M. P., Palmer, L. J., Price, B. J., Purkayastha, S., Pyrros, A. T., Oakden-Rayner, L., Okechukwu, C., Seyyed-Kalantari, L., ... Zhang, H. (2022). AI recognition of patient race in medical imaging: a modelling study. The Lancet Digital Health, 4(6), e406-e414. https://doi.org/10.1016/S2589-7500(22)00063-2

Gichoya, JW, Banerjee, I, Bhimireddy, AR, Burns, JL, Celi, LA, Chen, LC, Correa, R, Dullerud, N, Ghassemi, M, Huang, SC, Kuo, PC, Lungren, MP, Palmer, LJ, Price, BJ, Purkayastha, S, Pyrros, AT, Oakden-Rayner, L, Okechukwu, C, Seyyed-Kalantari, L, Trivedi, H, Wang, R, Zaiman, Z & Zhang, H 2022, 'AI recognition of patient race in medical imaging: a modelling study', The Lancet Digital Health, vol. 4, no. 6, pp. e406-e414. https://doi.org/10.1016/S2589-7500(22)00063-2

@article{d7c18fa385a347edbd21a3b15c839f7a,

title = "AI recognition of patient race in medical imaging: a modelling study",

abstract = "Background: Previous studies in medical imaging have shown disparate abilities of artificial intelligence (AI) to detect a person's race, yet there is no known correlation for race on medical imaging that would be obvious to human experts when interpreting the images. We aimed to conduct a comprehensive evaluation of the ability of AI to recognise a patient's racial identity from medical images. Methods: Using private (Emory CXR, Emory Chest CT, Emory Cervical Spine, and Emory Mammogram) and public (MIMIC-CXR, CheXpert, National Lung Cancer Screening Trial, RSNA Pulmonary Embolism CT, and Digital Hand Atlas) datasets, we evaluated, first, performance quantification of deep learning models in detecting race from medical images, including the ability of these models to generalise to external environments and across multiple imaging modalities. Second, we assessed possible confounding of anatomic and phenotypic population features by assessing the ability of these hypothesised confounders to detect race in isolation using regression models, and by re-evaluating the deep learning models by testing them on datasets stratified by these hypothesised confounding variables. Last, by exploring the effect of image corruptions on model performance, we investigated the underlying mechanism by which AI models can recognise race. Findings: In our study, we show that standard AI deep learning models can be trained to predict race from medical images with high performance across multiple imaging modalities, which was sustained under external validation conditions (x-ray imaging [area under the receiver operating characteristics curve (AUC) range 0·91–0·99], CT chest imaging [0·87–0·96], and mammography [0·81]). We also showed that this detection is not due to proxies or imaging-related surrogate covariates for race (eg, performance of possible confounders: body-mass index [AUC 0·55], disease distribution [0·61], and breast density [0·61]). Finally, we provide evidence to show that the ability of AI deep learning models persisted over all anatomical regions and frequency spectrums of the images, suggesting the efforts to control this behaviour when it is undesirable will be challenging and demand further study. Interpretation: The results from our study emphasise that the ability of AI deep learning models to predict self-reported race is itself not the issue of importance. However, our finding that AI can accurately predict self-reported race, even from corrupted, cropped, and noised medical images, often when clinical experts cannot, creates an enormous risk for all model deployments in medical imaging.",

author = "Gichoya, {Judy Wawira} and Imon Banerjee and Bhimireddy, {Ananth Reddy} and Burns, {John L.} and Celi, {Leo Anthony} and Chen, {Li Ching} and Ramon Correa and Natalie Dullerud and Marzyeh Ghassemi and Huang, {Shih Cheng} and Kuo, {Po Chih} and Lungren, {Matthew P.} and Palmer, {Lyle J.} and Price, {Brandon J.} and Saptarshi Purkayastha and Pyrros, {Ayis T.} and Lauren Oakden-Rayner and Chima Okechukwu and Laleh Seyyed-Kalantari and Hari Trivedi and Ryan Wang and Zachary Zaiman and Haoran Zhang",

note = "Publisher Copyright: {\textcopyright} 2022 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY 4.0 license",

year = "2022",

month = jun,

doi = "10.1016/S2589-7500(22)00063-2",

language = "English (US)",

volume = "4",

pages = "e406--e414",

journal = "The Lancet Digital Health",

issn = "2589-7500",

publisher = "Elsevier Ltd",

number = "6",

}

TY - JOUR

T1 - AI recognition of patient race in medical imaging

T2 - a modelling study

AU - Gichoya, Judy Wawira

AU - Banerjee, Imon

AU - Bhimireddy, Ananth Reddy

AU - Burns, John L.

AU - Celi, Leo Anthony

AU - Chen, Li Ching

AU - Correa, Ramon

AU - Dullerud, Natalie

AU - Ghassemi, Marzyeh

AU - Huang, Shih Cheng

AU - Kuo, Po Chih

AU - Lungren, Matthew P.

AU - Palmer, Lyle J.

AU - Price, Brandon J.

AU - Purkayastha, Saptarshi

AU - Pyrros, Ayis T.

AU - Oakden-Rayner, Lauren

AU - Okechukwu, Chima

AU - Seyyed-Kalantari, Laleh

AU - Trivedi, Hari

AU - Wang, Ryan

AU - Zaiman, Zachary

AU - Zhang, Haoran

PY - 2022/6

Y1 - 2022/6

N2 - Background: Previous studies in medical imaging have shown disparate abilities of artificial intelligence (AI) to detect a person's race, yet there is no known correlation for race on medical imaging that would be obvious to human experts when interpreting the images. We aimed to conduct a comprehensive evaluation of the ability of AI to recognise a patient's racial identity from medical images. Methods: Using private (Emory CXR, Emory Chest CT, Emory Cervical Spine, and Emory Mammogram) and public (MIMIC-CXR, CheXpert, National Lung Cancer Screening Trial, RSNA Pulmonary Embolism CT, and Digital Hand Atlas) datasets, we evaluated, first, performance quantification of deep learning models in detecting race from medical images, including the ability of these models to generalise to external environments and across multiple imaging modalities. Second, we assessed possible confounding of anatomic and phenotypic population features by assessing the ability of these hypothesised confounders to detect race in isolation using regression models, and by re-evaluating the deep learning models by testing them on datasets stratified by these hypothesised confounding variables. Last, by exploring the effect of image corruptions on model performance, we investigated the underlying mechanism by which AI models can recognise race. Findings: In our study, we show that standard AI deep learning models can be trained to predict race from medical images with high performance across multiple imaging modalities, which was sustained under external validation conditions (x-ray imaging [area under the receiver operating characteristics curve (AUC) range 0·91–0·99], CT chest imaging [0·87–0·96], and mammography [0·81]). We also showed that this detection is not due to proxies or imaging-related surrogate covariates for race (eg, performance of possible confounders: body-mass index [AUC 0·55], disease distribution [0·61], and breast density [0·61]). Finally, we provide evidence to show that the ability of AI deep learning models persisted over all anatomical regions and frequency spectrums of the images, suggesting the efforts to control this behaviour when it is undesirable will be challenging and demand further study. Interpretation: The results from our study emphasise that the ability of AI deep learning models to predict self-reported race is itself not the issue of importance. However, our finding that AI can accurately predict self-reported race, even from corrupted, cropped, and noised medical images, often when clinical experts cannot, creates an enormous risk for all model deployments in medical imaging.

AB - Background: Previous studies in medical imaging have shown disparate abilities of artificial intelligence (AI) to detect a person's race, yet there is no known correlation for race on medical imaging that would be obvious to human experts when interpreting the images. We aimed to conduct a comprehensive evaluation of the ability of AI to recognise a patient's racial identity from medical images. Methods: Using private (Emory CXR, Emory Chest CT, Emory Cervical Spine, and Emory Mammogram) and public (MIMIC-CXR, CheXpert, National Lung Cancer Screening Trial, RSNA Pulmonary Embolism CT, and Digital Hand Atlas) datasets, we evaluated, first, performance quantification of deep learning models in detecting race from medical images, including the ability of these models to generalise to external environments and across multiple imaging modalities. Second, we assessed possible confounding of anatomic and phenotypic population features by assessing the ability of these hypothesised confounders to detect race in isolation using regression models, and by re-evaluating the deep learning models by testing them on datasets stratified by these hypothesised confounding variables. Last, by exploring the effect of image corruptions on model performance, we investigated the underlying mechanism by which AI models can recognise race. Findings: In our study, we show that standard AI deep learning models can be trained to predict race from medical images with high performance across multiple imaging modalities, which was sustained under external validation conditions (x-ray imaging [area under the receiver operating characteristics curve (AUC) range 0·91–0·99], CT chest imaging [0·87–0·96], and mammography [0·81]). We also showed that this detection is not due to proxies or imaging-related surrogate covariates for race (eg, performance of possible confounders: body-mass index [AUC 0·55], disease distribution [0·61], and breast density [0·61]). Finally, we provide evidence to show that the ability of AI deep learning models persisted over all anatomical regions and frequency spectrums of the images, suggesting the efforts to control this behaviour when it is undesirable will be challenging and demand further study. Interpretation: The results from our study emphasise that the ability of AI deep learning models to predict self-reported race is itself not the issue of importance. However, our finding that AI can accurately predict self-reported race, even from corrupted, cropped, and noised medical images, often when clinical experts cannot, creates an enormous risk for all model deployments in medical imaging.

UR - http://www.scopus.com/inward/record.url?scp=85130545199&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85130545199&partnerID=8YFLogxK

U2 - 10.1016/S2589-7500(22)00063-2

DO - 10.1016/S2589-7500(22)00063-2

M3 - Article

C2 - 35568690

AN - SCOPUS:85130545199

SN - 2589-7500

VL - 4

SP - e406-e414

JO - The Lancet Digital Health

JF - The Lancet Digital Health

IS - 6

ER -

AI recognition of patient race in medical imaging: a modelling study

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this