Mitigating Bias in Radiology Machine Learning: 1. Data Handling

Pouria Rouzrokh; Bardia Khosravi; Shahriar Faghani; Mana Moassefi; Diana V.Vera Garcia; Yashbir Singh; Kuan Zhang; Gian Marco Conte; Bradley J. Erickson

doi:10.1148/ryai.210290

Mitigating Bias in Radiology Machine Learning: 1. Data Handling

Pouria Rouzrokh, Bardia Khosravi, Shahriar Faghani, Mana Moassefi, Diana V.Vera Garcia, Yashbir Singh, Kuan Zhang, Gian Marco Conte, Bradley J. Erickson

Radiology

Research output: Contribution to journal › Article › peer-review

Abstract

Minimizing bias is critical to adoption and implementation of machine learning (ML) in clinical practice. Systematic mathematical biases produce consistent and reproducible differences between the observed and expected performance of ML systems, resulting in suboptimal performance. Such biases can be traced back to various phases of ML development: data handling, model development, and performance evaluation. This report presents 12 suboptimal practices during data handling of an ML study, explains how those practices can lead to biases, and describes what may be done to mitigate them. Authors employ an arbitrary and simplified framework that splits ML data handling into four steps: data collection, data investigation, data splitting, and feature engineering. Examples from the available research literature are provided. A Google Colaboratory Jupyter notebook includes code examples to demonstrate the sub-optimal practices and steps to prevent them.

Original language	English (US)
Article number	e210290
Journal	Radiology: Artificial Intelligence
Volume	4
Issue number	5
DOIs	https://doi.org/10.1148/ryai.210290
State	Published - Sep 2022

Keywords

Bias
Computer-aided Diagnosis (CAD)
Convolu-tional Neural Network (CNN)
Data Handling
Deep Learning
Machine Learning

ASJC Scopus subject areas

Radiological and Ultrasound Technology
Radiology Nuclear Medicine and imaging
Artificial Intelligence

Access to Document

10.1148/ryai.210290

Cite this

@article{a2f794d890624410a36d6243a6bb0568,

title = "Mitigating Bias in Radiology Machine Learning: 1. Data Handling",

abstract = "Minimizing bias is critical to adoption and implementation of machine learning (ML) in clinical practice. Systematic mathematical biases produce consistent and reproducible differences between the observed and expected performance of ML systems, resulting in suboptimal performance. Such biases can be traced back to various phases of ML development: data handling, model development, and performance evaluation. This report presents 12 suboptimal practices during data handling of an ML study, explains how those practices can lead to biases, and describes what may be done to mitigate them. Authors employ an arbitrary and simplified framework that splits ML data handling into four steps: data collection, data investigation, data splitting, and feature engineering. Examples from the available research literature are provided. A Google Colaboratory Jupyter notebook includes code examples to demonstrate the sub-optimal practices and steps to prevent them.",

keywords = "Bias, Computer-aided Diagnosis (CAD), Convolu-tional Neural Network (CNN), Data Handling, Deep Learning, Machine Learning",

author = "Pouria Rouzrokh and Bardia Khosravi and Shahriar Faghani and Mana Moassefi and Garcia, {Diana V.Vera} and Yashbir Singh and Kuan Zhang and Conte, {Gian Marco} and Erickson, {Bradley J.}",

note = "Publisher Copyright: {\textcopyright} RSNA, 2022.",

year = "2022",

month = sep,

doi = "10.1148/ryai.210290",

language = "English (US)",

volume = "4",

journal = "Radiology: Artificial Intelligence",

issn = "2638-6100",

publisher = "Radiological Society of North America Inc.",

number = "5",

}

TY - JOUR

T1 - Mitigating Bias in Radiology Machine Learning

T2 - 1. Data Handling

AU - Rouzrokh, Pouria

AU - Khosravi, Bardia

AU - Faghani, Shahriar

AU - Moassefi, Mana

AU - Garcia, Diana V.Vera

AU - Singh, Yashbir

AU - Zhang, Kuan

AU - Conte, Gian Marco

AU - Erickson, Bradley J.

N1 - Publisher Copyright: © RSNA, 2022.

PY - 2022/9

Y1 - 2022/9

N2 - Minimizing bias is critical to adoption and implementation of machine learning (ML) in clinical practice. Systematic mathematical biases produce consistent and reproducible differences between the observed and expected performance of ML systems, resulting in suboptimal performance. Such biases can be traced back to various phases of ML development: data handling, model development, and performance evaluation. This report presents 12 suboptimal practices during data handling of an ML study, explains how those practices can lead to biases, and describes what may be done to mitigate them. Authors employ an arbitrary and simplified framework that splits ML data handling into four steps: data collection, data investigation, data splitting, and feature engineering. Examples from the available research literature are provided. A Google Colaboratory Jupyter notebook includes code examples to demonstrate the sub-optimal practices and steps to prevent them.

AB - Minimizing bias is critical to adoption and implementation of machine learning (ML) in clinical practice. Systematic mathematical biases produce consistent and reproducible differences between the observed and expected performance of ML systems, resulting in suboptimal performance. Such biases can be traced back to various phases of ML development: data handling, model development, and performance evaluation. This report presents 12 suboptimal practices during data handling of an ML study, explains how those practices can lead to biases, and describes what may be done to mitigate them. Authors employ an arbitrary and simplified framework that splits ML data handling into four steps: data collection, data investigation, data splitting, and feature engineering. Examples from the available research literature are provided. A Google Colaboratory Jupyter notebook includes code examples to demonstrate the sub-optimal practices and steps to prevent them.

KW - Bias

KW - Computer-aided Diagnosis (CAD)

KW - Convolu-tional Neural Network (CNN)

KW - Data Handling

KW - Deep Learning

KW - Machine Learning

UR - http://www.scopus.com/inward/record.url?scp=85139164638&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85139164638&partnerID=8YFLogxK

U2 - 10.1148/ryai.210290

DO - 10.1148/ryai.210290

M3 - Article

AN - SCOPUS:85139164638

SN - 2638-6100

VL - 4

JO - Radiology: Artificial Intelligence

JF - Radiology: Artificial Intelligence

IS - 5

M1 - e210290

ER -

Mitigating Bias in Radiology Machine Learning: 1. Data Handling

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this