352: Impact of Unknown Race/Ethnicity on Mortality and Use of Machine Learning for Ethnicity Prediction.

Document Type

Conference Proceeding

Publication Date

2025

Publication Title

Critical care medicine

Abstract

Introduction: Incomplete racial data in clinical databases may affect racial disparities in the ICU, such as mortality, leading to algorithm bias. Our study investigates how unknown race/ethnicity (UR/E) compares to known race/ethnicity (KR/E) in clinical outcomes. We also use machine learning to predict R/E identification and mortality variables. This aims to evaluate the impact of incomplete R/E data on ICU outcomes.

Methods: A retrospective study used the MIMIC-III dataset, including patients aged 18 and older and excluding those with multiple ICU admissions. The primary outcome examined was 90-day mortality, and secondary outcomes included ventilator-free and vasopressor-free days. The study utilized a Light Gradient Boosting Machine (LightGBM) to predict mortality and mutual information analysis to identify the most important variables affecting mortality. Empirical Bayes factors were calculated to assess model performance disparities between UR/E and KR/E. Clustering techniques were used to characterize the UR/E and KR/E. A large language model, Llama 3, was utilized to re-identify R/E from discharge notes.

Results: The study looked at 38,578 patients, with 12.56% having UR/E and 87.44% having KR/E. Univariate analysis showed a hazard ratio of 1.20 for UR/E vs KR/E for 90-day in-hospital mortality, while multivariate analysis showed a hazard ratio of 1.38. The LightGBM model showed strong discriminative power in both subgroups (AUROC>0.8) but had a significantly higher false positive rate in the UR/E group based on the empirical variables Bayes factors. Mutual information analysis showed variables predictive of mortality in UR/E compared to KR/E. Ethnicity clustering using principal component analysis on mortality-related variables showed similarities between “unknown” and “white.” The Llama 3 identified R/E in 10% of the UR/E patients, achieving nearly 80% accuracy, as confirmed via manual validation.

Conclusions: Based on our analysis, UR/E has a worse outcome of higher 90-day mortality. Machine learning models showed different performances of mortality prediction based on R/E and identified important mortality variables. Ethnicity clustering and NLP for R/E identification methods need further development and validation.

Volume

53

Issue

1 Suppl.

DOI

10.1097/01.ccm.0001100072.07462.4a

ISSN

1530-0293

Share

COinS