Related papers: Bias after Prompting: Persistent Discrimination in Large Language Models

Bias after Prompting: Persistent Discrimination in Large Language Models

URL: http://arxiv.org/abs/2509.08146v1
Date: Tue, 09 Sep 2025 20:59:50 GMT
Title: Bias after Prompting: Persistent Discrimination in Large Language Models
Authors: Nivedha Sivakumar, Natalie Mackraz, Samira Khorshidi, Krishna Patel, Barry-John Theobald, Luca Zappella, Nicholas Apostoloff,
Abstract summary: We find that biases can transfer through prompting and that popular prompt-based mitigation methods do not consistently prevent biases from transferring.<n>Specifically, the correlation between intrinsic biases and those after prompt adaptation remain moderate to strong across demographics and tasks.<n>We evaluate several prompt-based debiasing strategies and find that different approaches have distinct strengths, but none consistently reduce bias transfer across models, tasks or demographics.
Score: 9.558263120749356
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A dangerous assumption that can be made from prior work on the bias transfer hypothesis (BTH) is that biases do not transfer from pre-trained large language models (LLMs) to adapted models. We invalidate this assumption by studying the BTH in causal models under prompt adaptations, as prompting is an extremely popular and accessible adaptation strategy used in real-world applications. In contrast to prior work, we find that biases can transfer through prompting and that popular prompt-based mitigation methods do not consistently prevent biases from transferring. Specifically, the correlation between intrinsic biases and those after prompt adaptation remain moderate to strong across demographics and tasks -- for example, gender (rho >= 0.94) in co-reference resolution, and age (rho >= 0.98) and religion (rho >= 0.69) in question answering. Further, we find that biases remain strongly correlated when varying few-shot composition parameters, such as sample size, stereotypical content, occupational distribution and representational balance (rho >= 0.90). We evaluate several prompt-based debiasing strategies and find that different approaches have distinct strengths, but none consistently reduce bias transfer across models, tasks or demographics. These results demonstrate that correcting bias, and potentially improving reasoning ability, in intrinsic models may prevent propagation of biases to downstream tasks.

Related papers

From Global to Local: Social Bias Transfer in CLIP [22.508828073380112]
We investigate the phenomenon of bias transfer in prior literature through a comprehensive empirical analysis.<n>We examine how pre-training bias varies between global and local views of data, finding that bias measurement is highly dependent on the subset of data on which it is computed.<n>We explore why this inconsistency occurs, showing that under the current paradigm, representation spaces of different pre-trained CLIPs tend to converge when adapted for downstream tasks.
arXiv Detail & Related papers (2025-08-25T07:44:03Z)
Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs [51.00909549291524]
Large language models (LLMs) exhibit cognitive biases.<n>These biases vary across models and can be amplified by instruction tuning.<n>It remains unclear if these differences in biases stem from pretraining, finetuning, or even random noise.
arXiv Detail & Related papers (2025-07-09T18:01:14Z)
On the Origins of Sampling Bias: Implications on Fairness Measurement and Mitigation [0.0]
Several sources of bias exist and it is assumed that bias resulting from machine learning is born equally by different groups.<n> Sampling bias, in particular, is inconsistently used in the literature to describe bias due to the sampling procedure.<n>We introduce clearly defined variants of sampling bias, namely, sample size bias ( SSB) and underrepresentation bias (URB)
arXiv Detail & Related papers (2025-03-23T06:23:07Z)
Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models [4.274270062767065]
In this work, we investigate the bias transfer hypothesis (BTH) under prompt adaptations.<n>We find that bias transfer remains strongly correlated even when LLMs are specifically prompted to exhibit fair or biased behavior.<n>Our findings highlight the importance of ensuring fairness in pre-trained LLMs.
arXiv Detail & Related papers (2024-12-04T18:32:42Z)
How far can bias go? -- Tracing bias from pretraining data to alignment [54.51310112013655]
This study examines the correlation between gender-occupation bias in pre-training data and their manifestation in LLMs.<n>Our findings reveal that biases present in pre-training data are amplified in model outputs.
arXiv Detail & Related papers (2024-11-28T16:20:25Z)
CosFairNet:A Parameter-Space based Approach for Bias Free Learning [1.9116784879310025]
Deep neural networks trained on biased data often inadvertently learn unintended inference rules. We introduce a novel approach to address bias directly in the model's parameter space, preventing its propagation across layers. We show enhanced classification accuracy and debiasing effectiveness across various synthetic and real-world datasets.
arXiv Detail & Related papers (2024-10-19T13:06:40Z)
Revisiting the Dataset Bias Problem from a Statistical Perspective [72.94990819287551]
We study the "dataset bias" problem from a statistical standpoint. We identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b. We propose to mitigate dataset bias via either weighting the objective of each sample n by frac1p(u_n|b_n) or sampling that sample with a weight proportional to frac1p(u_n|b_n).
arXiv Detail & Related papers (2024-02-05T22:58:06Z)
Improving Bias Mitigation through Bias Experts in Natural Language Understanding [10.363406065066538]
We propose a new debiasing framework that introduces binary classifiers between the auxiliary model and the main model. Our proposed strategy improves the bias identification ability of the auxiliary model.
arXiv Detail & Related papers (2023-12-06T16:15:00Z)
Mitigating Bias for Question Answering Models by Tracking Bias Influence [84.66462028537475]
We propose BMBI, an approach to mitigate the bias of multiple-choice QA models. Based on the intuition that a model would lean to be more biased if it learns from a biased example, we measure the bias level of a query instance. We show that our method could be applied to multiple QA formulations across multiple bias categories.
arXiv Detail & Related papers (2023-10-13T00:49:09Z)
Fighting Fire with Fire: Contrastive Debiasing without Bias-free Data via Generative Bias-transformation [31.944147533327058]
Contrastive Debiasing via Generative Bias-transformation (CDvG) We propose a novel method, Contrastive Debiasing via Generative Bias-transformation (CDvG), which works without explicit bias labels or bias-free samples. Our method demonstrates superior performance compared to prior approaches, especially when bias-free samples are scarce or absent.
arXiv Detail & Related papers (2021-12-02T07:16:06Z)
Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race. Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables. This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z)
Improving Robustness by Augmenting Training Sentences with Predicate-Argument Structures [62.562760228942054]
Existing approaches to improve robustness against dataset biases mostly focus on changing the training objective. We propose to augment the input sentences in the training data with their corresponding predicate-argument structures. We show that without targeting a specific bias, our sentence augmentation improves the robustness of transformer models against multiple biases.
arXiv Detail & Related papers (2020-10-23T16:22:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.