Assessing socioeconomic bias in machine learning algorithms in health care: A case study of the HOUSES index

Young J. Juhn; Euijung Ryu; Chung Il Wi; Katherine S. King; Momin Malik; Santiago Romero-Brufau; Chunhua Weng; Sunghwan Sohn; Richard R. Sharp; John D. Halamka

doi:10.1093/jamia/ocac052

Assessing socioeconomic bias in machine learning algorithms in health care: A case study of the HOUSES index

Young J. Juhn, Euijung Ryu, Chung Il Wi, Katherine S. King, Momin Malik, Santiago Romero-Brufau, Chunhua Weng, Sunghwan Sohn, Richard R. Sharp, John D. Halamka

Research output: Contribution to journal › Article › peer-review

Abstract

Objective: Artificial intelligence (AI) models may propagate harmful biases in performance and hence negatively affect the underserved. We aimed to assess the degree to which data quality of electronic health records (EHRs) affected by inequities related to low socioeconomic status (SES), results in differential performance of AI models across SES. Materials and Methods: This study utilized existing machine learning models for predicting asthma exacerbation in children with asthma. We compared balanced error rate (BER) against different SES levels measured by HOUsing-based SocioEconomic Status measure (HOUSES) index. As a possible mechanism for differential performance, we also compared incompleteness of EHR information relevant to asthma care by SES. Results: Asthmatic children with lower SES had larger BER than those with higher SES (eg, ratio = 1.35 for HOUSES Q1 vs Q2-Q4) and had a higher proportion of missing information relevant to asthma care (eg, 41% vs 24% for missing asthma severity and 12% vs 9.8% for undiagnosed asthma despite meeting asthma criteria). Discussion: Our study suggests that lower SES is associated with worse predictive model performance. It also highlights the potential role of incomplete EHR data in this differential performance and suggests a way to mitigate this bias. Conclusion: The HOUSES index allows AI researchers to assess bias in predictive model performance by SES. Although our case study was based on a small sample size and a single-site study, the study results highlight a potential strategy for identifying bias by using an innovative SES measure.

Original language	English (US)
Pages (from-to)	1142-1151
Number of pages	10
Journal	Journal of the American Medical Informatics Association
Volume	29
Issue number	7
DOIs	https://doi.org/10.1093/jamia/ocac052
State	Published - Jul 1 2022

Keywords

HOUSES
algorithmic bias
artificial intelligence
electronic health records
social determinants of health

ASJC Scopus subject areas

Health Informatics

Access to Document

10.1093/jamia/ocac052

Cite this

@article{b0957b95260b4b55bca9aed345151f3a,

title = "Assessing socioeconomic bias in machine learning algorithms in health care: A case study of the HOUSES index",

abstract = "Objective: Artificial intelligence (AI) models may propagate harmful biases in performance and hence negatively affect the underserved. We aimed to assess the degree to which data quality of electronic health records (EHRs) affected by inequities related to low socioeconomic status (SES), results in differential performance of AI models across SES. Materials and Methods: This study utilized existing machine learning models for predicting asthma exacerbation in children with asthma. We compared balanced error rate (BER) against different SES levels measured by HOUsing-based SocioEconomic Status measure (HOUSES) index. As a possible mechanism for differential performance, we also compared incompleteness of EHR information relevant to asthma care by SES. Results: Asthmatic children with lower SES had larger BER than those with higher SES (eg, ratio = 1.35 for HOUSES Q1 vs Q2-Q4) and had a higher proportion of missing information relevant to asthma care (eg, 41% vs 24% for missing asthma severity and 12% vs 9.8% for undiagnosed asthma despite meeting asthma criteria). Discussion: Our study suggests that lower SES is associated with worse predictive model performance. It also highlights the potential role of incomplete EHR data in this differential performance and suggests a way to mitigate this bias. Conclusion: The HOUSES index allows AI researchers to assess bias in predictive model performance by SES. Although our case study was based on a small sample size and a single-site study, the study results highlight a potential strategy for identifying bias by using an innovative SES measure.",

keywords = "HOUSES, algorithmic bias, artificial intelligence, electronic health records, social determinants of health",

author = "Juhn, {Young J.} and Euijung Ryu and Wi, {Chung Il} and King, {Katherine S.} and Momin Malik and Santiago Romero-Brufau and Chunhua Weng and Sunghwan Sohn and Sharp, {Richard R.} and Halamka, {John D.}",

year = "2022",

month = jul,

day = "1",

doi = "10.1093/jamia/ocac052",

language = "English (US)",

volume = "29",

pages = "1142--1151",

journal = "Journal of the American Medical Informatics Association",

issn = "1067-5027",

publisher = "Oxford University Press",

number = "7",

}

TY - JOUR

T1 - Assessing socioeconomic bias in machine learning algorithms in health care

T2 - A case study of the HOUSES index

AU - Juhn, Young J.

AU - Ryu, Euijung

AU - Wi, Chung Il

AU - King, Katherine S.

AU - Malik, Momin

AU - Romero-Brufau, Santiago

AU - Weng, Chunhua

AU - Sohn, Sunghwan

AU - Sharp, Richard R.

AU - Halamka, John D.

PY - 2022/7/1

Y1 - 2022/7/1

N2 - Objective: Artificial intelligence (AI) models may propagate harmful biases in performance and hence negatively affect the underserved. We aimed to assess the degree to which data quality of electronic health records (EHRs) affected by inequities related to low socioeconomic status (SES), results in differential performance of AI models across SES. Materials and Methods: This study utilized existing machine learning models for predicting asthma exacerbation in children with asthma. We compared balanced error rate (BER) against different SES levels measured by HOUsing-based SocioEconomic Status measure (HOUSES) index. As a possible mechanism for differential performance, we also compared incompleteness of EHR information relevant to asthma care by SES. Results: Asthmatic children with lower SES had larger BER than those with higher SES (eg, ratio = 1.35 for HOUSES Q1 vs Q2-Q4) and had a higher proportion of missing information relevant to asthma care (eg, 41% vs 24% for missing asthma severity and 12% vs 9.8% for undiagnosed asthma despite meeting asthma criteria). Discussion: Our study suggests that lower SES is associated with worse predictive model performance. It also highlights the potential role of incomplete EHR data in this differential performance and suggests a way to mitigate this bias. Conclusion: The HOUSES index allows AI researchers to assess bias in predictive model performance by SES. Although our case study was based on a small sample size and a single-site study, the study results highlight a potential strategy for identifying bias by using an innovative SES measure.

AB - Objective: Artificial intelligence (AI) models may propagate harmful biases in performance and hence negatively affect the underserved. We aimed to assess the degree to which data quality of electronic health records (EHRs) affected by inequities related to low socioeconomic status (SES), results in differential performance of AI models across SES. Materials and Methods: This study utilized existing machine learning models for predicting asthma exacerbation in children with asthma. We compared balanced error rate (BER) against different SES levels measured by HOUsing-based SocioEconomic Status measure (HOUSES) index. As a possible mechanism for differential performance, we also compared incompleteness of EHR information relevant to asthma care by SES. Results: Asthmatic children with lower SES had larger BER than those with higher SES (eg, ratio = 1.35 for HOUSES Q1 vs Q2-Q4) and had a higher proportion of missing information relevant to asthma care (eg, 41% vs 24% for missing asthma severity and 12% vs 9.8% for undiagnosed asthma despite meeting asthma criteria). Discussion: Our study suggests that lower SES is associated with worse predictive model performance. It also highlights the potential role of incomplete EHR data in this differential performance and suggests a way to mitigate this bias. Conclusion: The HOUSES index allows AI researchers to assess bias in predictive model performance by SES. Although our case study was based on a small sample size and a single-site study, the study results highlight a potential strategy for identifying bias by using an innovative SES measure.

KW - HOUSES

KW - algorithmic bias

KW - artificial intelligence

KW - electronic health records

KW - social determinants of health

UR - http://www.scopus.com/inward/record.url?scp=85132049840&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85132049840&partnerID=8YFLogxK

U2 - 10.1093/jamia/ocac052

DO - 10.1093/jamia/ocac052

M3 - Article

C2 - 35396996

AN - SCOPUS:85132049840

SN - 1067-5027

VL - 29

SP - 1142

EP - 1151

JO - Journal of the American Medical Informatics Association

JF - Journal of the American Medical Informatics Association

IS - 7

ER -

Assessing socioeconomic bias in machine learning algorithms in health care: A case study of the HOUSES index

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this