Assessing Bias in Metric Models for LLM Open-Ended Generation Bias Benchmarks
- URL: http://arxiv.org/abs/2410.11059v1
- Date: Mon, 14 Oct 2024 20:08:40 GMT
- Title: Assessing Bias in Metric Models for LLM Open-Ended Generation Bias Benchmarks
- Authors: Nathaniel Demchak, Xin Guan, Zekun Wu, Ziyi Xu, Adriano Koshiyama, Emre Kazim,
- Abstract summary: This study examines such biases in open-generation benchmarks like BOLD and SAGED.
Results reveal unequal treatment of demographic descriptors, calling for more robust bias metric models.
- Score: 3.973239756262797
- License:
- Abstract: Open-generation bias benchmarks evaluate social biases in Large Language Models (LLMs) by analyzing their outputs. However, the classifiers used in analysis often have inherent biases, leading to unfair conclusions. This study examines such biases in open-generation benchmarks like BOLD and SAGED. Using the MGSD dataset, we conduct two experiments. The first uses counterfactuals to measure prediction variations across demographic groups by altering stereotype-related prefixes. The second applies explainability tools (SHAP) to validate that the observed biases stem from these counterfactuals. Results reveal unequal treatment of demographic descriptors, calling for more robust bias metric models.
Related papers
- How far can bias go? -- Tracing bias from pretraining data to alignment [54.51310112013655]
This study examines the correlation between gender-occupation bias in pre-training data and their manifestation in LLMs.
Our findings reveal that biases present in pre-training data are amplified in model outputs.
arXiv Detail & Related papers (2024-11-28T16:20:25Z) - Subtle Biases Need Subtler Measures: Dual Metrics for Evaluating Representative and Affinity Bias in Large Language Models [10.73340009530019]
This study addresses two such biases within Large Language Models (LLMs): representative bias and affinity bias.
We introduce two novel metrics to measure these biases: the Representative Bias Score (RBS) and the Affinity Bias Score (ABS)
Our analysis uncovers marked representative biases in prominent LLMs, with a preference for identities associated with being white, straight, and men.
Our investigation of affinity bias reveals distinctive evaluative patterns within each model, akin to bias fingerprints'
arXiv Detail & Related papers (2024-05-23T13:35:34Z) - COBIAS: Contextual Reliability in Bias Assessment [14.594920595573038]
Large Language Models (LLMs) often inherit biases from the web data they are trained on, which contains stereotypes and prejudices.
Current methods for evaluating and mitigating these biases rely on bias-benchmark datasets.
We introduce a contextual reliability framework, which evaluates model robustness to biased statements by considering the various contexts in which they may appear.
arXiv Detail & Related papers (2024-02-22T10:46:11Z) - Bias in Language Models: Beyond Trick Tests and Toward RUTEd Evaluation [49.3814117521631]
Standard benchmarks of bias and fairness in large language models (LLMs) measure the association between social attributes implied in user prompts and short responses.
We develop analogous RUTEd evaluations from three contexts of real-world use.
We find that standard bias metrics have no significant correlation with the more realistic bias metrics.
arXiv Detail & Related papers (2024-02-20T01:49:15Z) - ROBBIE: Robust Bias Evaluation of Large Generative Language Models [27.864027322486375]
Different prompt-based datasets can be used to measure social bias across multiple text domains and demographic axes.
We compare 6 different prompt-based bias and toxicity metrics across 12 demographic axes and 5 families of generative LLMs.
We conduct a comprehensive study of how well 3 bias/toxicity mitigation techniques perform across our suite of measurements.
arXiv Detail & Related papers (2023-11-29T23:03:04Z) - IBADR: an Iterative Bias-Aware Dataset Refinement Framework for
Debiasing NLU models [52.03761198830643]
We propose IBADR, an Iterative Bias-Aware dataset Refinement framework.
We first train a shallow model to quantify the bias degree of samples in the pool.
Then, we pair each sample with a bias indicator representing its bias degree, and use these extended samples to train a sample generator.
In this way, this generator can effectively learn the correspondence relationship between bias indicators and samples.
arXiv Detail & Related papers (2023-11-01T04:50:38Z) - General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space.
GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - LOGAN: Local Group Bias Detection by Clustering [86.38331353310114]
We argue that evaluating bias at the corpus level is not enough for understanding how biases are embedded in a model.
We propose LOGAN, a new bias detection technique based on clustering.
Experiments on toxicity classification and object classification tasks show that LOGAN identifies bias in a local region.
arXiv Detail & Related papers (2020-10-06T16:42:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.