Unmasking and Quantifying Racial Bias of Large Language Models in
Medical Report Generation
- URL: http://arxiv.org/abs/2401.13867v1
- Date: Thu, 25 Jan 2024 00:34:16 GMT
- Title: Unmasking and Quantifying Racial Bias of Large Language Models in
Medical Report Generation
- Authors: Yifan Yang, Xiaoyu Liu, Qiao Jin, Furong Huang, Zhiyong Lu
- Abstract summary: Large language models like GPT-3.5-turbo and GPT-4 hold promise for healthcare professionals.
These models tend to project higher costs and longer hospitalizations for White populations.
These biases mirror real-world healthcare disparities.
- Score: 36.15505795527914
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models like GPT-3.5-turbo and GPT-4 hold promise for
healthcare professionals, but they may inadvertently inherit biases during
their training, potentially affecting their utility in medical applications.
Despite few attempts in the past, the precise impact and extent of these biases
remain uncertain. Through both qualitative and quantitative analyses, we find
that these models tend to project higher costs and longer hospitalizations for
White populations and exhibit optimistic views in challenging medical scenarios
with much higher survival rates. These biases, which mirror real-world
healthcare disparities, are evident in the generation of patient backgrounds,
the association of specific diseases with certain races, and disparities in
treatment recommendations, etc. Our findings underscore the critical need for
future research to address and mitigate biases in language models, especially
in critical healthcare applications, to ensure fair and accurate outcomes for
all patients.
Related papers
- Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models [52.2001050216955]
Existing methods aim to enhance the performance of Medical Vision Language Model (MedVLM) by adjusting model structure, fine-tuning with high-quality data, or through preference fine-tuning.<n>We propose an expert-in-the-loop framework named Expert-Controlled-Free Guidance (Expert-CFG) to align MedVLM with clinical expertise without additional training.
arXiv Detail & Related papers (2025-07-12T09:03:30Z) - Bias in Large Language Models Across Clinical Applications: A Systematic Review [0.0]
Large language models (LLMs) are rapidly being integrated into healthcare, promising to enhance various clinical tasks.
This systematic review investigates the prevalence, sources, manifestations, and clinical implications of bias in LLMs.
arXiv Detail & Related papers (2025-04-03T13:32:08Z) - Fairness in Computational Innovations: Identifying Bias in Substance Use Treatment Length of Stay Prediction Models with Policy Implications [0.477529483515826]
Predictive machine learning (ML) models are computational innovations that can enhance medical decision-making.
However, societal biases can be encoded into such models, raising concerns about inadvertently affecting health outcomes for disadvantaged groups.
This issue is particularly pressing in the context of substance use disorder (SUD) treatment, where biases in predictive models could significantly impact the recovery of highly vulnerable patients.
arXiv Detail & Related papers (2024-12-08T06:47:23Z) - Uncertainty Quantification for Clinical Outcome Predictions with (Large) Language Models [10.895429855778747]
We consider the uncertainty quantification of LMs for EHR tasks in white-box and black-box settings.
We show that an effective reduction of model uncertainty can be achieved by using the proposed multi-tasking and ensemble methods in EHRs.
We validate our framework using longitudinal clinical data from more than 6,000 patients in ten clinical prediction tasks.
arXiv Detail & Related papers (2024-11-05T20:20:15Z) - Towards Fairer Health Recommendations: finding informative unbiased samples via Word Sense Disambiguation [3.328297368052458]
We tackle bias detection in medical curricula using NLP models, including LLMs.
We evaluate them on a gold standard dataset containing 4,105 excerpts annotated by medical experts for bias from a large corpus.
arXiv Detail & Related papers (2024-09-11T17:10:20Z) - Uncertainty Estimation of Large Language Models in Medical Question Answering [60.72223137560633]
Large Language Models (LLMs) show promise for natural language generation in healthcare, but risk hallucinating factually incorrect information.
We benchmark popular uncertainty estimation (UE) methods with different model sizes on medical question-answering datasets.
Our results show that current approaches generally perform poorly in this domain, highlighting the challenge of UE for medical applications.
arXiv Detail & Related papers (2024-07-11T16:51:33Z) - Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias [3.455189439319919]
We introduce Cross-Care, the first benchmark framework dedicated to assessing biases and real world knowledge in large language models (LLMs)
We evaluate how demographic biases embedded in pre-training corpora like $ThePile$ influence the outputs of LLMs.
Our results highlight substantial misalignment between LLM representation of disease prevalence and real disease prevalence rates across demographic subgroups.
arXiv Detail & Related papers (2024-05-09T02:33:14Z) - Seeds of Stereotypes: A Large-Scale Textual Analysis of Race and Gender Associations with Diseases in Online Sources [1.8259644946867188]
The study analyzed the context in which various diseases are discussed alongside markers of race and gender.
We found that demographic terms are disproportionately associated with specific disease concepts in online texts.
We find widespread disparities in the associations of specific racial and gender terms with the 18 diseases analyzed.
arXiv Detail & Related papers (2024-05-08T13:38:56Z) - Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - What's in a Name? Auditing Large Language Models for Race and Gender
Bias [49.28899492966893]
We employ an audit design to investigate biases in state-of-the-art large language models, including GPT-4.
We find that the advice systematically disadvantages names that are commonly associated with racial minorities and women.
arXiv Detail & Related papers (2024-02-21T18:25:25Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - What Do You See in this Patient? Behavioral Testing of Clinical NLP
Models [69.09570726777817]
We introduce an extendable testing framework that evaluates the behavior of clinical outcome models regarding changes of the input.
We show that model behavior varies drastically even when fine-tuned on the same data and that allegedly best-performing models have not always learned the most medically plausible patterns.
arXiv Detail & Related papers (2021-11-30T15:52:04Z) - MIMIC-IF: Interpretability and Fairness Evaluation of Deep Learning
Models on MIMIC-IV Dataset [15.436560770086205]
We focus on MIMIC-IV (Medical Information Mart for Intensive Care, version IV), the largest publicly available healthcare dataset.
We conduct comprehensive analyses of dataset representation bias as well as interpretability and prediction fairness of deep learning models for in-hospital mortality prediction.
arXiv Detail & Related papers (2021-02-12T20:28:06Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.