Open-DeBias: Toward Mitigating Open-Set Bias in Language Models
- URL: http://arxiv.org/abs/2509.23805v1
- Date: Sun, 28 Sep 2025 11:08:39 GMT
- Title: Open-DeBias: Toward Mitigating Open-Set Bias in Language Models
- Authors: Arti Rani, Shweta Singh, Nihar Ranjan Sahoo, Gaurav Kumar Nayak,
- Abstract summary: We tackle the novel problem of open-set bias detection and mitigation in text-based question answering tasks.<n>We introduce OpenBiasBench, a benchmark designed to evaluate biases across a wide range of categories and subgroups.<n>We also propose Open-DeBias, a novel, data-efficient, and parameter-efficient debiasing method.
- Score: 6.958242323649994
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have achieved remarkable success on question answering (QA) tasks, yet they often encode harmful biases that compromise fairness and trustworthiness. Most existing bias mitigation approaches are restricted to predefined categories, limiting their ability to address novel or context-specific emergent biases. To bridge this gap, we tackle the novel problem of open-set bias detection and mitigation in text-based QA. We introduce OpenBiasBench, a comprehensive benchmark designed to evaluate biases across a wide range of categories and subgroups, encompassing both known and previously unseen biases. Additionally, we propose Open-DeBias, a novel, data-efficient, and parameter-efficient debiasing method that leverages adapter modules to mitigate existing social and stereotypical biases while generalizing to unseen ones. Compared to the state-of-the-art BMBI method, Open-DeBias improves QA accuracy on BBQ dataset by nearly $48\%$ on ambiguous subsets and $6\%$ on disambiguated ones, using adapters fine-tuned on just a small fraction of the training data. Remarkably, the same adapters, in a zero-shot transfer to Korean BBQ, achieve $84\%$ accuracy, demonstrating robust language-agnostic generalization. Through extensive evaluation, we also validate the effectiveness of Open-DeBias across a broad range of NLP tasks, including StereoSet and CrowS-Pairs, highlighting its robustness, multilingual strength, and suitability for general-purpose, open-domain bias mitigation. The project page is available at: https://sites.google.com/view/open-debias25
Related papers
- KnowBias: Mitigating Social Bias in LLMs via Know-Bias Neuron Enhancement [5.243877326529689]
Large language models (LLMs) exhibit social biases that reinforce harmful stereotypes, limiting their safe deployment.<n>We propose KnowBias, a framework that mitigates bias by strengthening, rather than suppressing, neurons encoding bias-knowledge.<n>KnowBias identifies neurons encoding bias knowledge using a small set of bias-knowledge questions via attribution-based analysis, and selectively enhances them at inference time.
arXiv Detail & Related papers (2026-01-29T15:32:38Z) - Adaptive Generation of Bias-Eliciting Questions for LLMs [18.608477560948003]
Large language models (LLMs) are now widely deployed in user-facing applications, reaching hundreds of millions worldwide.<n>We introduce a counterfactual bias evaluation framework that automatically generates realistic, open-ended questions over sensitive attributes such as sex, race, or religion.<n>We also capture distinct response dimensions that are increasingly relevant in user interactions, such as asymmetric refusals and explicit acknowledgment of bias.
arXiv Detail & Related papers (2025-10-14T13:08:10Z) - Rethinking Prompt-based Debiasing in Large Language Models [40.90578215191079]
Investigating bias in large language models (LLMs) is crucial for developing trustworthy AI.<n>While prompt-based through prompt engineering is common, its effectiveness relies on the assumption that models inherently understand biases.<n>Our study systematically analyzed this assumption using the BBQ and StereoSet benchmarks on both open-source models as well as commercial GPT model.
arXiv Detail & Related papers (2025-03-12T10:06:03Z) - Evaluating and Mitigating Social Bias for Large Language Models in Open-ended Settings [13.686732204665738]
We extend an existing BBQ dataset by incorporating fill-in-the-blank and short-answer question types.<n>Our finding reveals that LLMs produce responses that are more biased against certain protected attributes, like age and socio-economic status.<n>Our debiasing approach combined zero-shot, few-shot, and chain-of-thought could significantly reduce the level of bias to almost 0.
arXiv Detail & Related papers (2024-12-09T01:29:47Z) - GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models [75.04426753720553]
We propose a framework to identify, quantify, and explain biases in an open set setting.
This pipeline leverages a Large Language Model (LLM) to propose biases starting from a set of captions.
We show two variations of this framework: OpenBias and GradBias.
arXiv Detail & Related papers (2024-08-29T16:51:07Z) - Identifying and Mitigating Social Bias Knowledge in Language Models [52.52955281662332]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.<n>FAST surpasses state-of-the-art baselines with superior debiasing performance.<n>This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z) - VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model [72.13121434085116]
We introduce VLBiasBench, a benchmark to evaluate biases in Large Vision-Language Models (LVLMs)<n>VLBiasBench features a dataset that covers nine distinct categories of social biases, including age, disability status, gender, nationality, physical appearance, race, religion, profession, social economic status, as well as two intersectional bias categories: race x gender and race x social economic status.<n>We conduct extensive evaluations on 15 open-source models as well as two advanced closed-source models, yielding new insights into the biases present in these models.
arXiv Detail & Related papers (2024-06-20T10:56:59Z) - OpenBias: Open-set Bias Detection in Text-to-Image Generative Models [108.2219657433884]
We tackle the challenge of open-set bias detection in text-to-image generative models presenting OpenBias.
OpenBias identifies and quantifies the severity of biases agnostically, without access to any precompiled set.
We study the behavior of Stable Diffusion 1.5, 2, and XL emphasizing new biases, never investigated before.
arXiv Detail & Related papers (2024-04-11T17:59:56Z) - GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language
Models [83.30078426829627]
Large language models (LLMs) have gained popularity and are being widely adopted by a large user community.
The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability.
We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
arXiv Detail & Related papers (2023-12-11T12:02:14Z) - Mitigating Bias for Question Answering Models by Tracking Bias Influence [84.66462028537475]
We propose BMBI, an approach to mitigate the bias of multiple-choice QA models.
Based on the intuition that a model would lean to be more biased if it learns from a biased example, we measure the bias level of a query instance.
We show that our method could be applied to multiple QA formulations across multiple bias categories.
arXiv Detail & Related papers (2023-10-13T00:49:09Z) - Few-shot Instruction Prompts for Pretrained Language Models to Detect
Social Biases [55.45617404586874]
We propose a few-shot instruction-based method for prompting pre-trained language models (LMs)
We show that large LMs can detect different types of fine-grained biases with similar and sometimes superior accuracy to fine-tuned models.
arXiv Detail & Related papers (2021-12-15T04:19:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.