Bias in Language Models: Beyond Trick Tests and Toward RUTEd Evaluation
- URL: http://arxiv.org/abs/2402.12649v2
- Date: Sun, 16 Feb 2025 18:57:20 GMT
- Title: Bias in Language Models: Beyond Trick Tests and Toward RUTEd Evaluation
- Authors: Kristian Lum, Jacy Reese Anthis, Kevin Robinson, Chirag Nagpal, Alexander D'Amour,
- Abstract summary: Standard benchmarks of bias and fairness in large language models (LLMs) measure the association between social attributes implied in user prompts and short responses.
We develop analogous RUTEd evaluations from three contexts of real-world use.
We find that standard bias metrics have no significant correlation with the more realistic bias metrics.
- Score: 49.3814117521631
- License:
- Abstract: Standard benchmarks of bias and fairness in large language models (LLMs) measure the association between social attributes implied in user prompts and short LLM responses. In the commonly studied domain of gender-occupation bias, we test whether these benchmarks are robust to lengthening the LLM responses as a measure of Realistic Use and Tangible Effects (i.e., RUTEd evaluations). From the current literature, we adapt three standard bias metrics (neutrality, skew, and stereotype), and we develop analogous RUTEd evaluations from three contexts of real-world use: children's bedtime stories, user personas, and English language learning exercises. We find that standard bias metrics have no significant correlation with the more realistic bias metrics. For example, selecting the least biased model based on the standard "trick tests" coincides with selecting the least biased model as measured in more realistic use no more than random chance. We suggest that there is not yet evidence to justify standard benchmarks as reliable proxies of real-world biases, and we encourage further development of context-specific RUTEd evaluations.
Related papers
- Rethinking Relation Extraction: Beyond Shortcuts to Generalization with a Debiased Benchmark [53.876493664396506]
Benchmarks are crucial for evaluating machine learning algorithm performance, facilitating comparison and identifying superior solutions.
This paper addresses the issue of entity bias in relation extraction tasks, where models tend to rely on entity mentions rather than context.
We propose a debiased relation extraction benchmark DREB that breaks the pseudo-correlation between entity mentions and relation types through entity replacement.
To establish a new baseline on DREB, we introduce MixDebias, a debiasing method combining data-level and model training-level techniques.
arXiv Detail & Related papers (2025-01-02T17:01:06Z) - Different Bias Under Different Criteria: Assessing Bias in LLMs with a Fact-Based Approach [7.969162168078149]
Large language models (LLMs) often reflect real-world biases, leading to efforts to mitigate these effects.
We introduce a novel metric to assess bias using fact-based criteria and real-world statistics.
arXiv Detail & Related papers (2024-11-26T11:32:43Z) - Assessing Bias in Metric Models for LLM Open-Ended Generation Bias Benchmarks [3.973239756262797]
This study examines such biases in open-generation benchmarks like BOLD and SAGED.
Results reveal unequal treatment of demographic descriptors, calling for more robust bias metric models.
arXiv Detail & Related papers (2024-10-14T20:08:40Z) - COBIAS: Contextual Reliability in Bias Assessment [14.594920595573038]
Large Language Models (LLMs) often inherit biases from the web data they are trained on, which contains stereotypes and prejudices.
Current methods for evaluating and mitigating these biases rely on bias-benchmark datasets.
We introduce a contextual reliability framework, which evaluates model robustness to biased statements by considering the various contexts in which they may appear.
arXiv Detail & Related papers (2024-02-22T10:46:11Z) - GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language
Models [83.30078426829627]
Large language models (LLMs) have gained popularity and are being widely adopted by a large user community.
The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability.
We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
arXiv Detail & Related papers (2023-12-11T12:02:14Z) - An Offline Metric for the Debiasedness of Click Models [52.25681483524383]
Click models are a common method for extracting information from user clicks.
Recent work shows that the current evaluation practices in the community fail to guarantee that a well-performing click model generalizes well to downstream tasks.
We introduce the concept of debiasedness in click modeling and derive a metric for measuring it.
arXiv Detail & Related papers (2023-04-19T10:59:34Z) - The SAME score: Improved cosine based bias score for word embeddings [49.75878234192369]
We introduce SAME, a novel bias score for semantic bias in embeddings.
We show that SAME is capable of measuring semantic bias and identify potential causes for social bias in downstream tasks.
arXiv Detail & Related papers (2022-03-28T09:28:13Z) - Measuring Fairness with Biased Rulers: A Survey on Quantifying Biases in
Pretrained Language Models [2.567384209291337]
An increasing awareness of biased patterns in natural language processing resources has motivated many metrics to quantify bias' and fairness'
We survey the existing literature on fairness metrics for pretrained language models and experimentally evaluate compatibility.
We find that many metrics are not compatible and highly depend on (i) templates, (ii) attribute and target seeds and (iii) the choice of embeddings.
arXiv Detail & Related papers (2021-12-14T15:04:56Z) - LOGAN: Local Group Bias Detection by Clustering [86.38331353310114]
We argue that evaluating bias at the corpus level is not enough for understanding how biases are embedded in a model.
We propose LOGAN, a new bias detection technique based on clustering.
Experiments on toxicity classification and object classification tasks show that LOGAN identifies bias in a local region.
arXiv Detail & Related papers (2020-10-06T16:42:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.