Related papers: Keeping Up with the Language Models: Systematic Benchmark Extension for Bias Auditing

Keeping Up with the Language Models: Systematic Benchmark Extension for Bias Auditing

URL: http://arxiv.org/abs/2305.12620v2
Date: Wed, 25 Sep 2024 14:06:31 GMT
Title: Keeping Up with the Language Models: Systematic Benchmark Extension for Bias Auditing
Authors: Ioana Baldini, Chhavi Yadav, Manish Nagireddy, Payel Das, Kush R. Varshney,
Abstract summary: We extend an existing bias benchmark for NLI using a combination of LM-generated lexical variations, adversarial filtering, and human validation. We show that BBNLI-next reduces the accuracy of state-of-the-art NLI models from 95.3% to a strikingly low 57.5%. We propose bias measures that take into account both bias and model brittleness.
Score: 33.25539075550122
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Bias auditing of language models (LMs) has received considerable attention as LMs are becoming widespread. As such, several benchmarks for bias auditing have been proposed. At the same time, the rapid evolution of LMs can make these benchmarks irrelevant in no time. Bias auditing is further complicated by LM brittleness: when a presumably biased outcome is observed, is it due to model bias or model brittleness? We propose enlisting the models themselves to help construct bias auditing datasets that remain challenging, and introduce bias measures that distinguish between different types of model errors. First, we extend an existing bias benchmark for NLI (BBNLI) using a combination of LM-generated lexical variations, adversarial filtering, and human validation. We demonstrate that the newly created dataset BBNLI-next is more challenging than BBNLI: on average, BBNLI-next reduces the accuracy of state-of-the-art NLI models from 95.3%, as observed by BBNLI, to a strikingly low 57.5%. Second, we employ BBNLI-next to showcase the interplay between robustness and bias: we point out shortcomings in current bias scores and propose bias measures that take into account both bias and model brittleness. Third, despite the fact that BBNLI-next was designed with non-generative models in mind, we show that the new dataset is also able to uncover bias in state-of-the-art open-source generative LMs. Note: All datasets included in this work are in English and they address US-centered social biases. In the spirit of efficient NLP research, no model training or fine-tuning was performed to conduct this research. Warning: This paper contains offensive text examples.

Related papers

An Empirical Survey of Model Merging Algorithms for Social Bias Mitigation [0.9430947207126281]
Large language models (LLMs) are known to inherit and even amplify societal biases present in their pre-training corpora.<n>We empirically survey seven algorithms: Linear, Karcher Mean, SLERP, NuSLERP, TIES, DELLA, and Nearswap, applying 13 open weight models in the GPT, LLaMA, and Qwen families.<n>We find a trade-off between bias reduction and downstream performance: methods achieving greater bias mitigation degrade accuracy.
arXiv Detail & Related papers (2025-12-02T12:18:48Z)
Surface Fairness, Deep Bias: A Comparative Study of Bias in Language Models [49.41113560646115]
We investigate various proxy measures of bias in large language models (LLMs)<n>We find that evaluating models with pre-prompted personae on a multi-subject benchmark (MMLU) leads to negligible and mostly random differences in scores.<n>With the recent trend for LLM assistant memory and personalization, these problems open up from a different angle.
arXiv Detail & Related papers (2025-06-12T08:47:40Z)
Beneath the Surface: How Large Language Models Reflect Hidden Bias [7.026605828163043]
We introduce the Hidden Bias Benchmark (HBB), a novel dataset designed to assess hidden bias that bias concepts are hidden within naturalistic, subtly framed contexts in real-world scenarios. We analyze six state-of-the-art Large Language Models, revealing that while models reduce bias in response to overt bias, they continue to reinforce biases in nuanced settings.
arXiv Detail & Related papers (2025-02-27T04:25:54Z)
On Fairness of Unified Multimodal Large Language Model for Image Generation [19.122441856516215]
We benchmark the latest U-MLLMs and find that most exhibit significant demographic biases, such as gender and race bias. Our analysis shows that bias originates primarily from the language model. We propose a novel balanced preference model to balance the demographic distribution with synthetic data.
arXiv Detail & Related papers (2025-02-05T18:21:03Z)
How far can bias go? -- Tracing bias from pretraining data to alignment [54.51310112013655]
This study examines the correlation between gender-occupation bias in pre-training data and their manifestation in LLMs. Our findings reveal that biases present in pre-training data are amplified in model outputs.
arXiv Detail & Related papers (2024-11-28T16:20:25Z)
Promoting Equality in Large Language Models: Identifying and Mitigating the Implicit Bias based on Bayesian Theory [29.201402717025335]
Large language models (LLMs) are trained on extensive text corpora, which inevitably include biased information. We have formally defined the implicit bias problem and developed an innovative framework for bias removal based on Bayesian theory.
arXiv Detail & Related papers (2024-08-20T07:40:12Z)
BiasDPO: Mitigating Bias in Language Models through Direct Preference Optimization [0.0]
Large Language Models (LLMs) have become pivotal in advancing natural language processing, yet their potential to perpetuate biases poses significant concerns. This paper introduces a new framework employing Direct Preference Optimization (DPO) to mitigate gender, racial, and religious biases in English text. By developing a loss function that favors less biased over biased completions, our approach cultivates a preference for respectful and non-discriminatory language.
arXiv Detail & Related papers (2024-07-18T22:32:20Z)
VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model [72.13121434085116]
VLBiasBench is a benchmark aimed at evaluating biases in Large Vision-Language Models (LVLMs) We construct a dataset encompassing nine distinct categories of social biases, including age, disability status, gender, nationality, physical appearance, race, religion, profession, social economic status and two intersectional bias categories (race x gender, and race x social economic status) We conduct extensive evaluations on 15 open-source models as well as one advanced closed-source model, providing some new insights into the biases revealing from these models.
arXiv Detail & Related papers (2024-06-20T10:56:59Z)
Projective Methods for Mitigating Gender Bias in Pre-trained Language Models [10.418595661963062]
Projective methods are fast to implement, use a small number of saved parameters, and make no updates to the existing model parameters. We find that projective methods can be effective at both intrinsic bias and downstream bias mitigation, but that the two outcomes are not necessarily correlated.
arXiv Detail & Related papers (2024-03-27T17:49:31Z)
Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement [75.7148545929689]
Large language models (LLMs) improve their performance through self-feedback on certain tasks while degrade on others. We formally define LLM's self-bias - the tendency to favor its own generation. We analyze six LLMs on translation, constrained text generation, and mathematical reasoning tasks.
arXiv Detail & Related papers (2024-02-18T03:10:39Z)
GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language Models [83.30078426829627]
Large language models (LLMs) have gained popularity and are being widely adopted by a large user community. The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability. We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
arXiv Detail & Related papers (2023-12-11T12:02:14Z)
Current Topological and Machine Learning Applications for Bias Detection in Text [4.799066966918178]
This study utilizes the RedditBias database to analyze textual biases. Four transformer models, including BERT and RoBERTa variants, were explored. Findings suggest BERT, particularly mini BERT, excels in bias classification, while multilingual models lag.
arXiv Detail & Related papers (2023-11-22T16:12:42Z)
Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases [55.45617404586874]
We propose a few-shot instruction-based method for prompting pre-trained language models (LMs) We show that large LMs can detect different types of fine-grained biases with similar and sometimes superior accuracy to fine-tuned models.
arXiv Detail & Related papers (2021-12-15T04:19:52Z)
A Generative Approach for Mitigating Structural Biases in Natural Language Inference [24.44419010439227]
In this work, we reformulate the NLI task as a generative task, where a model is conditioned on the biased subset of the input and the label. We show that this approach is highly robust to large amounts of bias. We find that generative models are difficult to train and they generally perform worse than discriminative baselines.
arXiv Detail & Related papers (2021-08-31T17:59:45Z)
Learning from others' mistakes: Avoiding dataset biases without modeling them [111.17078939377313]
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task. Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available. We show a method for training models that learn to ignore these problematic correlations.
arXiv Detail & Related papers (2020-12-02T16:10:54Z)
Towards Robustifying NLI Models Against Lexical Dataset Biases [94.79704960296108]
This paper explores both data-level and model-level debiasing methods to robustify models against lexical dataset biases. First, we debias the dataset through data augmentation and enhancement, but show that the model bias cannot be fully removed via this method. The second approach employs a bag-of-words sub-model to capture the features that are likely to exploit the bias and prevents the original model from learning these biased features.
arXiv Detail & Related papers (2020-05-10T17:56:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.