Related papers: What's in a Name? Auditing Large Language Models for Race and Gender Bias

What's in a Name? Auditing Large Language Models for Race and Gender Bias

URL: http://arxiv.org/abs/2402.14875v2
Date: Thu, 29 Feb 2024 19:39:35 GMT
Title: What's in a Name? Auditing Large Language Models for Race and Gender Bias
Authors: Amit Haim, Alejandro Salinas, Julian Nyarko
Abstract summary: We employ an audit design to investigate biases in state-of-the-art large language models, including GPT-4. We find that the advice systematically disadvantages names that are commonly associated with racial minorities and women.
Score: 49.28899492966893
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We employ an audit design to investigate biases in state-of-the-art large language models, including GPT-4. In our study, we prompt the models for advice involving a named individual across a variety of scenarios, such as during car purchase negotiations or election outcome predictions. We find that the advice systematically disadvantages names that are commonly associated with racial minorities and women. Names associated with Black women receive the least advantageous outcomes. The biases are consistent across 42 prompt templates and several models, indicating a systemic issue rather than isolated incidents. While providing numerical, decision-relevant anchors in the prompt can successfully counteract the biases, qualitative details have inconsistent effects and may even increase disparities. Our findings underscore the importance of conducting audits at the point of LLM deployment and implementation to mitigate their potential for harm against marginalized communities.

Related papers

Obscured but Not Erased: Evaluating Nationality Bias in LLMs via Name-Based Bias Benchmarks [0.0]
Large Language Models (LLMs) can exhibit latent biases towards specific nationalities even when explicit demographic markers are not present.<n>We introduce a novel name-based benchmarking approach to investigate the impact of substituting explicit nationality labels with culturally indicative names.<n>Our experiments show that small models are less accurate and exhibit more bias compared to their larger counterparts.
arXiv Detail & Related papers (2025-07-22T19:54:49Z)
Robustly Improving LLM Fairness in Realistic Settings via Interpretability [0.16843915833103415]
Anti-bias prompts fail when realistic contextual details are introduced.<n>We find that adding realistic context such as company names, culture descriptions from public careers pages, and selective hiring constraints induces significant racial and gender biases.<n>Our internal bias mitigation identifies race and gender-correlated directions and applies affine concept editing at inference time.
arXiv Detail & Related papers (2025-06-12T17:34:38Z)
Surface Fairness, Deep Bias: A Comparative Study of Bias in Language Models [49.41113560646115]
We investigate various proxy measures of bias in large language models (LLMs)<n>We find that evaluating models with pre-prompted personae on a multi-subject benchmark (MMLU) leads to negligible and mostly random differences in scores.<n>With the recent trend for LLM assistant memory and personalization, these problems open up from a different angle.
arXiv Detail & Related papers (2025-06-12T08:47:40Z)
The Root Shapes the Fruit: On the Persistence of Gender-Exclusive Harms in Aligned Language Models [58.130894823145205]
We center transgender, nonbinary, and other gender-diverse identities to investigate how alignment procedures interact with pre-existing gender-diverse bias. Our findings reveal that DPO-aligned models are particularly sensitive to supervised finetuning. We conclude with recommendations tailored to DPO and broader alignment practices.
arXiv Detail & Related papers (2024-11-06T06:50:50Z)
Investigating Implicit Bias in Large Language Models: A Large-Scale Study of Over 50 LLMs [0.0]
Large Language Models (LLMs) are being adopted across a wide range of tasks. Recent research indicates that LLMs can harbor implicit biases even when they pass explicit bias evaluations. This study highlights that newer or larger language models do not automatically exhibit reduced bias.
arXiv Detail & Related papers (2024-10-13T03:43:18Z)
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs) By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases. The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z)
Prompt and Prejudice [29.35618753825668]
This paper investigates the impact of using first names in Large Language Models (LLMs) and Vision Language Models (VLMs) We propose an approach that appends first names to ethically annotated text scenarios to reveal demographic biases in model outputs.
arXiv Detail & Related papers (2024-08-07T14:11:33Z)
"You Gotta be a Doctor, Lin": An Investigation of Name-Based Bias of Large Language Models in Employment Recommendations [29.183942575629214]
We utilize GPT-3.5-Turbo and Llama 3-70B-Instruct to simulate hiring decisions and salary recommendations for candidates with 320 first names that strongly signal their race and gender. Our empirical results indicate a preference among these models for hiring candidates with White female-sounding names over other demographic groups across 40 occupations.
arXiv Detail & Related papers (2024-06-18T03:11:43Z)
Uncovering Name-Based Biases in Large Language Models Through Simulated Trust Game [0.0]
Gender and race inferred from an individual's name are a notable source of stereotypes and biases that subtly influence social interactions. We show that our approach can detect name-based biases in both base and instruction-tuned models.
arXiv Detail & Related papers (2024-04-23T02:21:17Z)
GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language Models [83.30078426829627]
Large language models (LLMs) have gained popularity and are being widely adopted by a large user community. The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability. We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
arXiv Detail & Related papers (2023-12-11T12:02:14Z)
Aligning with Whom? Large Language Models Have Gender and Racial Biases in Subjective NLP Tasks [15.015148115215315]
We conduct experiments on four popular large language models (LLMs) to investigate their capability to understand group differences and potential biases in their predictions for politeness and offensiveness. We find that for both tasks, model predictions are closer to the labels from White and female participants. More specifically, when being prompted to respond from the perspective of "Black" and "Asian" individuals, models show lower performance in predicting both overall scores as well as the scores from corresponding groups.
arXiv Detail & Related papers (2023-11-16T10:02:24Z)
MultiModal Bias: Introducing a Framework for Stereotypical Bias Assessment beyond Gender and Race in Vision Language Models [40.12132844347926]
We provide a visual and textual bias benchmark called MMBias, consisting of around 3,800 images and phrases covering 14 population subgroups. We utilize this dataset to assess bias in several prominent self supervised multimodal models, including CLIP, ALBEF, and ViLT. We introduce a debiasing method designed specifically for such large pre-trained models that can be applied as a post-processing step to mitigate bias.
arXiv Detail & Related papers (2023-03-16T17:36:37Z)
How True is GPT-2? An Empirical Analysis of Intersectional Occupational Biases [50.591267188664666]
Downstream applications are at risk of inheriting biases contained in natural language models. We analyze the occupational biases of a popular generative language model, GPT-2. For a given job, GPT-2 reflects the societal skew of gender and ethnicity in the US, and in some cases, pulls the distribution towards gender parity.
arXiv Detail & Related papers (2021-02-08T11:10:27Z)
UnQovering Stereotyping Biases via Underspecified Questions [68.81749777034409]
We present UNQOVER, a framework to probe and quantify biases through underspecified questions. We show that a naive use of model scores can lead to incorrect bias estimates due to two forms of reasoning errors. We use this metric to analyze four important classes of stereotypes: gender, nationality, ethnicity, and religion.
arXiv Detail & Related papers (2020-10-06T01:49:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.