ROBBIE: Robust Bias Evaluation of Large Generative Language Models
- URL: http://arxiv.org/abs/2311.18140v1
- Date: Wed, 29 Nov 2023 23:03:04 GMT
- Title: ROBBIE: Robust Bias Evaluation of Large Generative Language Models
- Authors: David Esiobu, Xiaoqing Tan, Saghar Hosseini, Megan Ung, Yuchen Zhang,
Jude Fernandes, Jane Dwivedi-Yu, Eleonora Presani, Adina Williams, Eric
Michael Smith
- Abstract summary: Different prompt-based datasets can be used to measure social bias across multiple text domains and demographic axes.
We compare 6 different prompt-based bias and toxicity metrics across 12 demographic axes and 5 families of generative LLMs.
We conduct a comprehensive study of how well 3 bias/toxicity mitigation techniques perform across our suite of measurements.
- Score: 27.864027322486375
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: As generative large language models (LLMs) grow more performant and
prevalent, we must develop comprehensive enough tools to measure and improve
their fairness. Different prompt-based datasets can be used to measure social
bias across multiple text domains and demographic axes, meaning that testing
LLMs on more datasets can potentially help us characterize their biases more
fully, and better ensure equal and equitable treatment of marginalized
demographic groups. In this work, our focus is two-fold:
(1) Benchmarking: a comparison of 6 different prompt-based bias and toxicity
metrics across 12 demographic axes and 5 families of generative LLMs. Out of
those 6 metrics, AdvPromptSet and HolisticBiasR are novel datasets proposed in
the paper. The comparison of those benchmarks gives us insights about the bias
and toxicity of the compared models. Therefore, we explore the frequency of
demographic terms in common LLM pre-training corpora and how this may relate to
model biases.
(2) Mitigation: we conduct a comprehensive study of how well 3 bias/toxicity
mitigation techniques perform across our suite of measurements. ROBBIE aims to
provide insights for practitioners while deploying a model, emphasizing the
need to not only measure potential harms, but also understand how they arise by
characterizing the data, mitigate harms once found, and balance any trade-offs.
We open-source our analysis code in hopes of encouraging broader measurements
of bias in future LLMs.
Related papers
- Bias Similarity Across Large Language Models [32.0365189539138]
Bias in machine learning models has been a chronic problem.
We take a comprehensive look at ten open- and closed-source Large Language Models.
We measure functional similarity to understand how biases manifest across models.
arXiv Detail & Related papers (2024-10-15T19:21:14Z) - Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions [25.809599403713506]
Large Language Models (LLMs) are increasingly being employed in numerous studies to simulate societies and execute diverse social tasks.
LLMs are susceptible to societal biases due to their exposure to human-generated data.
This study investigates the presence of implicit gender biases in multi-agent LLM interactions and proposes two strategies to mitigate these biases.
arXiv Detail & Related papers (2024-10-03T15:28:05Z) - A Multi-LLM Debiasing Framework [85.17156744155915]
Large Language Models (LLMs) are powerful tools with the potential to benefit society immensely, yet, they have demonstrated biases that perpetuate societal inequalities.
Recent research has shown a growing interest in multi-LLM approaches, which have been demonstrated to be effective in improving the quality of reasoning.
We propose a novel multi-LLM debiasing framework aimed at reducing bias in LLMs.
arXiv Detail & Related papers (2024-09-20T20:24:50Z) - Editable Fairness: Fine-Grained Bias Mitigation in Language Models [52.66450426729818]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.
FAST surpasses state-of-the-art baselines with superior debiasing performance.
This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z) - VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model [72.13121434085116]
VLBiasBench is a benchmark aimed at evaluating biases in Large Vision-Language Models (LVLMs)
We construct a dataset encompassing nine distinct categories of social biases, including age, disability status, gender, nationality, physical appearance, race, religion, profession, social economic status and two intersectional bias categories (race x gender, and race x social economic status)
We conduct extensive evaluations on 15 open-source models as well as one advanced closed-source model, providing some new insights into the biases revealing from these models.
arXiv Detail & Related papers (2024-06-20T10:56:59Z) - GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language
Models [83.30078426829627]
Large language models (LLMs) have gained popularity and are being widely adopted by a large user community.
The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability.
We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
arXiv Detail & Related papers (2023-12-11T12:02:14Z) - Investigating Subtler Biases in LLMs: Ageism, Beauty, Institutional, and Nationality Bias in Generative Models [0.0]
This paper investigates bias along less-studied but still consequential, dimensions, such as age and beauty.
We ask whether LLMs hold wide-reaching biases of positive or negative sentiment for specific social groups similar to the "what is beautiful is good" bias found in people in experimental psychology.
arXiv Detail & Related papers (2023-09-16T07:07:04Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Metrics for Dataset Demographic Bias: A Case Study on Facial Expression Recognition [4.336779198334903]
One of the most prominent types of demographic bias are statistical imbalances in the representation of demographic groups in the datasets.
We develop a taxonomy for the classification of these metrics, providing a practical guide for the selection of appropriate metrics.
The paper provides valuable insights for researchers in AI and related fields to mitigate dataset bias and improve the fairness and accuracy of AI models.
arXiv Detail & Related papers (2023-03-28T11:04:18Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.