Related papers: How Deep Is Representational Bias in LLMs? The Cases of Caste and Religion

How Deep Is Representational Bias in LLMs? The Cases of Caste and Religion

URL: http://arxiv.org/abs/2508.03712v1
Date: Tue, 22 Jul 2025 17:28:37 GMT
Title: How Deep Is Representational Bias in LLMs? The Cases of Caste and Religion
Authors: Agrima Seth, Monojit Choudhary, Sunayana Sitaram, Kentaro Toyama, Aditya Vashistha, Kalika Bali,
Abstract summary: We quantify the presence and "stickiness" of representational bias in large language models for religion and caste.<n>We find that GPT-4 responses consistently overrepresent culturally dominant groups far beyond their statistical representation.<n>Our findings suggest that representational bias in LLMs has a winner-take-all quality that is more biased than the likely distribution bias in their training data.
Score: 25.340454708475754
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Representational bias in large language models (LLMs) has predominantly been measured through single-response interactions and has focused on Global North-centric identities like race and gender. We expand on that research by conducting a systematic audit of GPT-4 Turbo to reveal how deeply encoded representational biases are and how they extend to less-explored dimensions of identity. We prompt GPT-4 Turbo to generate over 7,200 stories about significant life events (such as weddings) in India, using prompts designed to encourage diversity to varying extents. Comparing the diversity of religious and caste representation in the outputs against the actual population distribution in India as recorded in census data, we quantify the presence and "stickiness" of representational bias in the LLM for religion and caste. We find that GPT-4 responses consistently overrepresent culturally dominant groups far beyond their statistical representation, despite prompts intended to encourage representational diversity. Our findings also suggest that representational bias in LLMs has a winner-take-all quality that is more biased than the likely distribution bias in their training data, and repeated prompt-based nudges have limited and inconsistent efficacy in dislodging these biases. These results suggest that diversifying training data alone may not be sufficient to correct LLM bias, highlighting the need for more fundamental changes in model development. Dataset and Codebook: https://github.com/agrimaseth/How-Deep-Is-Representational-Bias-in-LLMs

Related papers

Addressing Stereotypes in Large Language Models: A Critical Examination and Mitigation [0.0]
Large Language models (LLMs) have gained popularity in recent years with the advancement of Natural Language Processing (NLP)<n>This study inspects and highlights the need to address biases in LLMs amid growing generative Artificial Intelligence (AI)<n>We utilize bias-specific benchmarks such StereoSet and CrowSPairs to evaluate the existence of various biases in many different generative models such as BERT, GPT 3.5, and ADA.
arXiv Detail & Related papers (2025-11-18T05:43:34Z)
A Comprehensive Study of Implicit and Explicit Biases in Large Language Models [1.0555164678638427]
This study highlights the need to address biases in Large Language Models amid growing generative AI.<n>We studied bias-specific benchmarks such as StereoSet and CrowSPairs to evaluate the existence of various biases in multiple generative models such as BERT and GPT 3.5.<n>Results indicated fine-tuned models struggle with gender biases but excelled at identifying and avoiding racial biases.
arXiv Detail & Related papers (2025-11-18T05:27:17Z)
Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models [81.45743826739054]
A major barrier has been the lack of demographic annotations in web-scale datasets such as LAION-400M.<n>We create person-centric annotations for the full dataset, including over 276 million bounding boxes, perceived gender and race/ethnicity labels, and automatically generated captions.<n>Using them, we uncover demographic imbalances and harmful associations, such as the disproportionate linking of men and individuals perceived as Black or Middle Eastern with crime-related and negative content.
arXiv Detail & Related papers (2025-10-04T07:51:59Z)
FairI Tales: Evaluation of Fairness in Indian Contexts with a Focus on Bias and Stereotypes [23.71105683137539]
Existing studies on fairness are largely Western-focused, making them inadequate for culturally diverse countries such as India.<n>We introduce INDIC-BIAS, a comprehensive India-centric benchmark designed to evaluate fairness of LLMs across 85 socio identity groups.
arXiv Detail & Related papers (2025-06-29T06:31:06Z)
Surface Fairness, Deep Bias: A Comparative Study of Bias in Language Models [49.41113560646115]
We investigate various proxy measures of bias in large language models (LLMs)<n>We find that evaluating models with pre-prompted personae on a multi-subject benchmark (MMLU) leads to negligible and mostly random differences in scores.<n>With the recent trend for LLM assistant memory and personalization, these problems open up from a different angle.
arXiv Detail & Related papers (2025-06-12T08:47:40Z)
Fact-or-Fair: A Checklist for Behavioral Testing of AI Models on Fairness-Related Queries [85.909363478929]
In this study, we focus on 19 real-world statistics collected from authoritative sources.<n>We develop a checklist comprising objective and subjective queries to analyze behavior of large language models.<n>We propose metrics to assess factuality and fairness, and formally prove the inherent trade-off between these two aspects.
arXiv Detail & Related papers (2025-02-09T10:54:11Z)
On Fairness of Unified Multimodal Large Language Model for Image Generation [19.122441856516215]
We benchmark the latest U-MLLMs and find that most exhibit significant demographic biases, such as gender and race bias.<n>Our analysis shows that bias originates primarily from the language model.<n>We propose a novel balanced preference model to balance the demographic distribution with synthetic data.
arXiv Detail & Related papers (2025-02-05T18:21:03Z)
How far can bias go? -- Tracing bias from pretraining data to alignment [54.51310112013655]
This study examines the correlation between gender-occupation bias in pre-training data and their manifestation in LLMs.<n>Our findings reveal that biases present in pre-training data are amplified in model outputs.
arXiv Detail & Related papers (2024-11-28T16:20:25Z)
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs) By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases. The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z)
The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention [61.80236015147771]
We quantify the trade-off between using diversity interventions and preserving demographic factuality in T2I models. Experiments on DoFaiR reveal that diversity-oriented instructions increase the number of different gender and racial groups. We propose Fact-Augmented Intervention (FAI) to reflect on verbalized or retrieved factual information about gender and racial compositions of generation subjects in history.
arXiv Detail & Related papers (2024-06-29T09:09:42Z)
The Impact of Unstated Norms in Bias Analysis of Language Models [0.03495246564946556]
Counterfactual bias evaluation is a widely used approach to quantifying bias.<n>We find that template-based probes can lead to unrealistic bias measurements.
arXiv Detail & Related papers (2024-04-04T14:24:06Z)
What's in a Name? Auditing Large Language Models for Race and Gender Bias [45.1187517058961]
We employ an audit design to investigate biases in state-of-the-art large language models, including GPT-4.<n>We find that the advice systematically disadvantages names that are commonly associated with racial minorities and women.
arXiv Detail & Related papers (2024-02-21T18:25:25Z)
GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language Models [83.30078426829627]
Large language models (LLMs) have gained popularity and are being widely adopted by a large user community. The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability. We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
arXiv Detail & Related papers (2023-12-11T12:02:14Z)
Sociodemographic Prompting is Not Yet an Effective Approach for Simulating Subjective Judgments with LLMs [13.744746481528711]
Large Language Models (LLMs) are widely used to simulate human responses across diverse contexts.<n>We evaluate nine popular LLMs on their ability to understand demographic differences in two subjective judgment tasks: politeness and offensiveness.<n>We find that in zero-shot settings, most models' predictions for both tasks align more closely with labels from White participants than those from Asian or Black participants.
arXiv Detail & Related papers (2023-11-16T10:02:24Z)
Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs [67.51906565969227]
We study the unintended side-effects of persona assignment on the ability of LLMs to perform basic reasoning tasks. Our study covers 24 reasoning datasets, 4 LLMs, and 19 diverse personas (e.g. an Asian person) spanning 5 socio-demographic groups.
arXiv Detail & Related papers (2023-11-08T18:52:17Z)
Indian-BhED: A Dataset for Measuring India-Centric Biases in Large Language Models [18.201326983938014]
Large Language Models (LLMs) can encode societal biases, exposing their users to representational harms. We quantify stereotypical bias in popular LLMs according to an Indian-centric frame through Indian-BhED, a first of its kind dataset. We find that the majority of LLMs tested have a strong propensity to output stereotypes in the Indian context.
arXiv Detail & Related papers (2023-09-15T17:38:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.