Related papers: Surface Fairness, Deep Bias: A Comparative Study of Bias in Language Models

Surface Fairness, Deep Bias: A Comparative Study of Bias in Language Models

URL: http://arxiv.org/abs/2506.10491v1
Date: Thu, 12 Jun 2025 08:47:40 GMT
Title: Surface Fairness, Deep Bias: A Comparative Study of Bias in Language Models
Authors: Aleksandra Sorokovikova, Pavel Chizhov, Iuliia Eremenko, Ivan P. Yamshchikov,
Abstract summary: We investigate various proxy measures of bias in large language models (LLMs)<n>We find that evaluating models with pre-prompted personae on a multi-subject benchmark (MMLU) leads to negligible and mostly random differences in scores.<n>With the recent trend for LLM assistant memory and personalization, these problems open up from a different angle.
Score: 49.41113560646115
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern language models are trained on large amounts of data. These data inevitably include controversial and stereotypical content, which contains all sorts of biases related to gender, origin, age, etc. As a result, the models express biased points of view or produce different results based on the assigned personality or the personality of the user. In this paper, we investigate various proxy measures of bias in large language models (LLMs). We find that evaluating models with pre-prompted personae on a multi-subject benchmark (MMLU) leads to negligible and mostly random differences in scores. However, if we reformulate the task and ask a model to grade the user's answer, this shows more significant signs of bias. Finally, if we ask the model for salary negotiation advice, we see pronounced bias in the answers. With the recent trend for LLM assistant memory and personalization, these problems open up from a different angle: modern LLM users do not need to pre-prompt the description of their persona since the model already knows their socio-demographics.

Related papers

Breaking the Benchmark: Revealing LLM Bias via Minimal Contextual Augmentation [12.56588481992456]
Large Language Models have been shown to demonstrate stereotypical biases in their representations and behavior.<n>We introduce a novel and general augmentation framework that involves three plug-and-play steps.<n>We find that Large Language Models are susceptible to perturbations to their inputs, showcasing a higher likelihood to behave stereotypically.
arXiv Detail & Related papers (2025-10-27T23:05:12Z)
The Biased Samaritan: LLM biases in Perceived Kindness [0.0]
Large Language Models (LLMs) have become ubiquitous in many fields.<n>This paper provides a novel method for evaluating the demographic biases of various generative AI models.
arXiv Detail & Related papers (2025-06-12T23:33:42Z)
Fact-or-Fair: A Checklist for Behavioral Testing of AI Models on Fairness-Related Queries [85.909363478929]
In this study, we focus on 19 real-world statistics collected from authoritative sources.<n>We develop a checklist comprising objective and subjective queries to analyze behavior of large language models.<n>We propose metrics to assess factuality and fairness, and formally prove the inherent trade-off between these two aspects.
arXiv Detail & Related papers (2025-02-09T10:54:11Z)
How far can bias go? -- Tracing bias from pretraining data to alignment [54.51310112013655]
This study examines the correlation between gender-occupation bias in pre-training data and their manifestation in LLMs.<n>Our findings reveal that biases present in pre-training data are amplified in model outputs.
arXiv Detail & Related papers (2024-11-28T16:20:25Z)
LLMs are Biased Teachers: Evaluating LLM Bias in Personalized Education [6.354025374447606]
We evaluate large language models (LLMs) for bias in the personalized educational setting.<n>We reveal significant biases in how models generate and select educational content tailored to different demographic groups.
arXiv Detail & Related papers (2024-10-17T20:27:44Z)
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs) By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases. The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z)
VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model [72.13121434085116]
We introduce VLBiasBench, a benchmark to evaluate biases in Large Vision-Language Models (LVLMs)<n>VLBiasBench features a dataset that covers nine distinct categories of social biases, including age, disability status, gender, nationality, physical appearance, race, religion, profession, social economic status, as well as two intersectional bias categories: race x gender and race x social economic status.<n>We conduct extensive evaluations on 15 open-source models as well as two advanced closed-source models, yielding new insights into the biases present in these models.
arXiv Detail & Related papers (2024-06-20T10:56:59Z)
Debiasing Algorithm through Model Adaptation [5.482673673984126]
We perform causal analysis to identify problematic model components and discover that mid-upper feed-forward layers are most prone to convey bias. Based on the analysis results, we intervene in the model by applying a linear projection to the weight matrices of these layers. Our titular method, DAMA, significantly decreases bias as measured by diverse metrics while maintaining the model's performance on downstream tasks.
arXiv Detail & Related papers (2023-10-29T05:50:03Z)
Mitigating Bias for Question Answering Models by Tracking Bias Influence [84.66462028537475]
We propose BMBI, an approach to mitigate the bias of multiple-choice QA models. Based on the intuition that a model would lean to be more biased if it learns from a biased example, we measure the bias level of a query instance. We show that our method could be applied to multiple QA formulations across multiple bias categories.
arXiv Detail & Related papers (2023-10-13T00:49:09Z)
OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs [3.5342505775640247]
We present OpinionGPT, a web demo in which users can ask questions and select all biases they wish to investigate. The demo will answer this question using a model fine-tuned on text representing each of the selected biases. To train the underlying model, we identified 11 different biases (political, geographic, gender, age) and derived an instruction-tuning corpus in which each answer was written by members of one of these demographics.
arXiv Detail & Related papers (2023-09-07T17:41:01Z)
Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias [33.99768156365231]
We introduce a causal formulation for bias measurement in generative language models.<n>We propose a benchmark called OccuGender, with a bias-measuring procedure to investigate occupational gender bias.<n>The results show that these models exhibit substantial occupational gender bias.
arXiv Detail & Related papers (2022-12-20T22:41:24Z)
"I'm sorry to hear that": Finding New Biases in Language Models with a Holistic Descriptor Dataset [12.000335510088648]
We present a new, more inclusive bias measurement dataset, HolisticBias, which includes nearly 600 descriptor terms across 13 different demographic axes. HolisticBias was assembled in a participatory process including experts and community members with lived experience of these terms. We demonstrate that HolisticBias is effective at measuring previously undetectable biases in token likelihoods from language models.
arXiv Detail & Related papers (2022-05-18T20:37:25Z)
UnQovering Stereotyping Biases via Underspecified Questions [68.81749777034409]
We present UNQOVER, a framework to probe and quantify biases through underspecified questions. We show that a naive use of model scores can lead to incorrect bias estimates due to two forms of reasoning errors. We use this metric to analyze four important classes of stereotypes: gender, nationality, ethnicity, and religion.
arXiv Detail & Related papers (2020-10-06T01:49:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.