Related papers: Mitigating the Risk of Health Inequity Exacerbated by Large Language Models

Mitigating the Risk of Health Inequity Exacerbated by Large Language Models

URL: http://arxiv.org/abs/2410.05180v2
Date: Mon, 14 Oct 2024 14:27:34 GMT
Title: Mitigating the Risk of Health Inequity Exacerbated by Large Language Models
Authors: Yuelyu Ji, Wenhe Ma, Sonish Sivarajkumar, Hang Zhang, Eugene Mathew Sadhu, Zhuochun Li, Xizhi Wu, Shyam Visweswaran, Yanshan Wang,
Abstract summary: We show that incorporating non decisive sociodemographic factors into the input of large language models can lead to incorrect and harmful outputs. We introduce EquityGuard, a novel framework designed to detect and mitigate the risk of health inequities in LLM based medical applications.
Score: 5.02540629164568
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advancements in large language models have demonstrated their potential in numerous medical applications, particularly in automating clinical trial matching for translational research and enhancing medical question answering for clinical decision support. However, our study shows that incorporating non decisive sociodemographic factors such as race, sex, income level, LGBT+ status, homelessness, illiteracy, disability, and unemployment into the input of LLMs can lead to incorrect and harmful outputs for these populations. These discrepancies risk exacerbating existing health disparities if LLMs are widely adopted in healthcare. To address this issue, we introduce EquityGuard, a novel framework designed to detect and mitigate the risk of health inequities in LLM based medical applications. Our evaluation demonstrates its efficacy in promoting equitable outcomes across diverse populations.

Related papers

Med-CoDE: Medical Critique based Disagreement Evaluation Framework [72.42301910238861]
The reliability and accuracy of large language models (LLMs) in medical contexts remain critical concerns. Current evaluation methods often lack robustness and fail to provide a comprehensive assessment of LLM performance. We propose Med-CoDE, a specifically designed evaluation framework for medical LLMs to address these challenges.
arXiv Detail & Related papers (2025-04-21T16:51:11Z)
Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions. We propose a novel approach utilizing structured medical reasoning. Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z)
A Comprehensive Survey on the Trustworthiness of Large Language Models in Healthcare [5.765614539740084]
The application of large language models (LLMs) in healthcare has the potential to revolutionize clinical decision-making, medical research, and patient care. As LLMs are increasingly integrated into healthcare systems, several critical challenges must be addressed to ensure their reliable and ethical deployment.
arXiv Detail & Related papers (2025-02-21T18:43:06Z)
Fact or Guesswork? Evaluating Large Language Model's Medical Knowledge with Structured One-Hop Judgment [108.55277188617035]
Large language models (LLMs) have been widely adopted in various downstream task domains, but their ability to directly recall and apply factual medical knowledge remains under-explored. Most existing medical QA benchmarks assess complex reasoning or multi-hop inference, making it difficult to isolate LLMs' inherent medical knowledge from their reasoning capabilities. We introduce the Medical Knowledge Judgment, a dataset specifically designed to measure LLMs' one-hop factual medical knowledge.
arXiv Detail & Related papers (2025-02-20T05:27:51Z)
The Multicultural Medical Assistant: Can LLMs Improve Medical ASR Errors Across Borders? [0.0]
This study investigates the prevalence and impact of ASR errors in medical transcription in Nigeria, the United Kingdom, and the United States. We assess the potential and limitations of Large Language Models to address challenges related to accents and medical terminology in ASR.
arXiv Detail & Related papers (2025-01-25T19:40:26Z)
Unveiling Performance Challenges of Large Language Models in Low-Resource Healthcare: A Demographic Fairness Perspective [7.1047384702030625]
We evaluate state-of-the-art large language models (LLMs) with three prevalent learning frameworks across six diverse healthcare tasks. We find significant challenges in applying LLMs to real-world healthcare tasks and persistent fairness issues across demographic groups.
arXiv Detail & Related papers (2024-11-30T18:52:30Z)
Persuasion with Large Language Models: a Survey [49.86930318312291]
Large Language Models (LLMs) have created new disruptive possibilities for persuasive communication. In areas such as politics, marketing, public health, e-commerce, and charitable giving, such LLM Systems have already achieved human-level or even super-human persuasiveness. Our survey suggests that the current and future potential of LLM-based persuasion poses profound ethical and societal risks.
arXiv Detail & Related papers (2024-11-11T10:05:52Z)
CliMedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models in Clinical Scenarios [50.032101237019205]
CliMedBench is a comprehensive benchmark with 14 expert-guided core clinical scenarios. The reliability of this benchmark has been confirmed in several ways.
arXiv Detail & Related papers (2024-10-04T15:15:36Z)
Severity Prediction in Mental Health: LLM-based Creation, Analysis, Evaluation of a Novel Multilingual Dataset [3.4146360486107987]
Large Language Models (LLMs) are increasingly integrated into various medical fields, including mental health support systems. We present a novel multilingual adaptation of widely-used mental health datasets, translated from English into six languages. This dataset enables a comprehensive evaluation of LLM performance in detecting mental health conditions and assessing their severity across multiple languages.
arXiv Detail & Related papers (2024-09-25T22:14:34Z)
Analyzing Diversity in Healthcare LLM Research: A Scientometric Perspective [3.724351094182122]
This paper presents a comprehensive scientometric analysis of large language models (LLMs) research for healthcare. Our findings highlight significant gender and geographic disparities, with a predominance of male authors. We propose actionable strategies to enhance diversity and inclusivity in artificial intelligence research.
arXiv Detail & Related papers (2024-06-19T02:00:51Z)
A Toolbox for Surfacing Health Equity Harms and Biases in Large Language Models [20.11590976578911]
Large language models (LLMs) hold promise to serve complex health information needs but also have the potential to introduce harm and exacerbate health disparities. Reliably evaluating equity-related model failures is a critical step toward developing systems that promote health equity. We present resources and methodologies for surfacing biases with potential to precipitate equity-related harms in long-form, LLM-generated answers to medical questions.
arXiv Detail & Related papers (2024-03-18T17:56:37Z)
Large language models in healthcare and medical domain: A review [4.456243157307507]
Large language models (LLMs) provide proficient responses to free-text queries. This review explores the potential of LLMs to amplify the efficiency and effectiveness of diverse healthcare applications.
arXiv Detail & Related papers (2023-12-12T20:54:51Z)
Self-Diagnosis and Large Language Models: A New Front for Medical Misinformation [8.738092015092207]
We evaluate the capabilities of large language models (LLMs) from the lens of a general user self-diagnosing. We develop a testing methodology which can be used to evaluate responses to open-ended questions mimicking real-world use cases. We reveal that a) these models perform worse than previously known, and b) they exhibit peculiar behaviours, including overconfidence when stating incorrect recommendations.
arXiv Detail & Related papers (2023-07-10T21:28:26Z)
Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning. They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health. Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z)
Large Language Models for Healthcare Data Augmentation: An Example on Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM) Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z)
SPeC: A Soft Prompt-Based Calibration on Performance Variability of Large Language Model in Clinical Notes Summarization [50.01382938451978]
We introduce a model-agnostic pipeline that employs soft prompts to diminish variance while preserving the advantages of prompt-based summarization. Experimental findings indicate that our method not only bolsters performance but also effectively curbs variance for various language models.
arXiv Detail & Related papers (2023-03-23T04:47:46Z)
Fair Machine Learning in Healthcare: A Review [90.22219142430146]
We analyze the intersection of fairness in machine learning and healthcare disparities. We provide a critical review of the associated fairness metrics from a machine learning standpoint. We propose several new research directions that hold promise for developing ethical and equitable ML applications in healthcare.
arXiv Detail & Related papers (2022-06-29T04:32:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.