Related papers: Unmasking and Quantifying Racial Bias of Large Language Models in Medical Report Generation

Unmasking and Quantifying Racial Bias of Large Language Models in Medical Report Generation

URL: http://arxiv.org/abs/2401.13867v1
Date: Thu, 25 Jan 2024 00:34:16 GMT
Title: Unmasking and Quantifying Racial Bias of Large Language Models in Medical Report Generation
Authors: Yifan Yang, Xiaoyu Liu, Qiao Jin, Furong Huang, Zhiyong Lu
Abstract summary: Large language models like GPT-3.5-turbo and GPT-4 hold promise for healthcare professionals. These models tend to project higher costs and longer hospitalizations for White populations. These biases mirror real-world healthcare disparities.
Score: 36.15505795527914
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models like GPT-3.5-turbo and GPT-4 hold promise for healthcare professionals, but they may inadvertently inherit biases during their training, potentially affecting their utility in medical applications. Despite few attempts in the past, the precise impact and extent of these biases remain uncertain. Through both qualitative and quantitative analyses, we find that these models tend to project higher costs and longer hospitalizations for White populations and exhibit optimistic views in challenging medical scenarios with much higher survival rates. These biases, which mirror real-world healthcare disparities, are evident in the generation of patient backgrounds, the association of specific diseases with certain races, and disparities in treatment recommendations, etc. Our findings underscore the critical need for future research to address and mitigate biases in language models, especially in critical healthcare applications, to ensure fair and accurate outcomes for all patients.

Related papers

Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models [52.2001050216955]
Existing methods aim to enhance the performance of Medical Vision Language Model (MedVLM) by adjusting model structure, fine-tuning with high-quality data, or through preference fine-tuning.<n>We propose an expert-in-the-loop framework named Expert-Controlled-Free Guidance (Expert-CFG) to align MedVLM with clinical expertise without additional training.
arXiv Detail & Related papers (2025-07-12T09:03:30Z)
Bias in Large Language Models Across Clinical Applications: A Systematic Review [0.0]
Large language models (LLMs) are rapidly being integrated into healthcare, promising to enhance various clinical tasks. This systematic review investigates the prevalence, sources, manifestations, and clinical implications of bias in LLMs.
arXiv Detail & Related papers (2025-04-03T13:32:08Z)
Fairness in Computational Innovations: Identifying Bias in Substance Use Treatment Length of Stay Prediction Models with Policy Implications [0.477529483515826]
Predictive machine learning (ML) models are computational innovations that can enhance medical decision-making. However, societal biases can be encoded into such models, raising concerns about inadvertently affecting health outcomes for disadvantaged groups. This issue is particularly pressing in the context of substance use disorder (SUD) treatment, where biases in predictive models could significantly impact the recovery of highly vulnerable patients.
arXiv Detail & Related papers (2024-12-08T06:47:23Z)
Uncertainty Quantification for Clinical Outcome Predictions with (Large) Language Models [10.895429855778747]
We consider the uncertainty quantification of LMs for EHR tasks in white-box and black-box settings. We show that an effective reduction of model uncertainty can be achieved by using the proposed multi-tasking and ensemble methods in EHRs. We validate our framework using longitudinal clinical data from more than 6,000 patients in ten clinical prediction tasks.
arXiv Detail & Related papers (2024-11-05T20:20:15Z)
Towards Fairer Health Recommendations: finding informative unbiased samples via Word Sense Disambiguation [3.328297368052458]
We tackle bias detection in medical curricula using NLP models, including LLMs. We evaluate them on a gold standard dataset containing 4,105 excerpts annotated by medical experts for bias from a large corpus.
arXiv Detail & Related papers (2024-09-11T17:10:20Z)
Uncertainty Estimation of Large Language Models in Medical Question Answering [60.72223137560633]
Large Language Models (LLMs) show promise for natural language generation in healthcare, but risk hallucinating factually incorrect information. We benchmark popular uncertainty estimation (UE) methods with different model sizes on medical question-answering datasets. Our results show that current approaches generally perform poorly in this domain, highlighting the challenge of UE for medical applications.
arXiv Detail & Related papers (2024-07-11T16:51:33Z)
Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias [3.455189439319919]
We introduce Cross-Care, the first benchmark framework dedicated to assessing biases and real world knowledge in large language models (LLMs) We evaluate how demographic biases embedded in pre-training corpora like $ThePile$ influence the outputs of LLMs. Our results highlight substantial misalignment between LLM representation of disease prevalence and real disease prevalence rates across demographic subgroups.
arXiv Detail & Related papers (2024-05-09T02:33:14Z)
Seeds of Stereotypes: A Large-Scale Textual Analysis of Race and Gender Associations with Diseases in Online Sources [1.8259644946867188]
The study analyzed the context in which various diseases are discussed alongside markers of race and gender. We found that demographic terms are disproportionately associated with specific disease concepts in online texts. We find widespread disparities in the associations of specific racial and gender terms with the 18 diseases analyzed.
arXiv Detail & Related papers (2024-05-08T13:38:56Z)
Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals. Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z)
What's in a Name? Auditing Large Language Models for Race and Gender Bias [49.28899492966893]
We employ an audit design to investigate biases in state-of-the-art large language models, including GPT-4. We find that the advice systematically disadvantages names that are commonly associated with racial minorities and women.
arXiv Detail & Related papers (2024-02-21T18:25:25Z)
MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion. It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space. It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z)
What Do You See in this Patient? Behavioral Testing of Clinical NLP Models [69.09570726777817]
We introduce an extendable testing framework that evaluates the behavior of clinical outcome models regarding changes of the input. We show that model behavior varies drastically even when fine-tuned on the same data and that allegedly best-performing models have not always learned the most medically plausible patterns.
arXiv Detail & Related papers (2021-11-30T15:52:04Z)
MIMIC-IF: Interpretability and Fairness Evaluation of Deep Learning Models on MIMIC-IV Dataset [15.436560770086205]
We focus on MIMIC-IV (Medical Information Mart for Intensive Care, version IV), the largest publicly available healthcare dataset. We conduct comprehensive analyses of dataset representation bias as well as interpretability and prediction fairness of deep learning models for in-hospital mortality prediction.
arXiv Detail & Related papers (2021-02-12T20:28:06Z)
UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model. UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data. We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD) UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.