FairPair: A Robust Evaluation of Biases in Language Models through Paired Perturbations
- URL: http://arxiv.org/abs/2404.06619v1
- Date: Tue, 9 Apr 2024 21:09:22 GMT
- Title: FairPair: A Robust Evaluation of Biases in Language Models through Paired Perturbations
- Authors: Jane Dwivedi-Yu, Raaz Dwivedi, Timo Schick,
- Abstract summary: We present FairPair, an evaluation framework for assessing differential treatment that occurs during ordinary usage.
Unlike prior work, our method factors in the inherent variability that comes from the generation process itself by measuring the sampling variability.
- Score: 33.24762796282484
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The accurate evaluation of differential treatment in language models to specific groups is critical to ensuring a positive and safe user experience. An ideal evaluation should have the properties of being robust, extendable to new groups or attributes, and being able to capture biases that appear in typical usage (rather than just extreme, rare cases). Relatedly, bias evaluation should surface not only egregious biases but also ones that are subtle and commonplace, such as a likelihood for talking about appearances with regard to women. We present FairPair, an evaluation framework for assessing differential treatment that occurs during ordinary usage. FairPair operates through counterfactual pairs, but crucially, the paired continuations are grounded in the same demographic group, which ensures equivalent comparison. Additionally, unlike prior work, our method factors in the inherent variability that comes from the generation process itself by measuring the sampling variability. We present an evaluation of several commonly used generative models and a qualitative analysis that indicates a preference for discussing family and hobbies with regard to women.
Related papers
- Compare without Despair: Reliable Preference Evaluation with Generation Separability [20.50638483427141]
We introduce a measure, separability, which estimates how suitable a test instance is for pairwise preference evaluation.
For a candidate test instance, separability samples multiple generations from a pair of models, and measures how distinguishable the two sets of generations are.
Experiments show that instances with high separability values yield more consistent preference ratings from both human- and auto-raters.
arXiv Detail & Related papers (2024-07-02T01:37:56Z) - Social Bias Probing: Fairness Benchmarking for Language Models [38.180696489079985]
This paper proposes a novel framework for probing language models for social biases by assessing disparate treatment.
We curate SoFa, a large-scale benchmark designed to address the limitations of existing fairness collections.
We show that biases within language models are more nuanced than acknowledged, indicating a broader scope of encoded biases than previously recognized.
arXiv Detail & Related papers (2023-11-15T16:35:59Z) - Evaluating Bias and Fairness in Gender-Neutral Pretrained
Vision-and-Language Models [23.65626682262062]
We quantify bias amplification in pretraining and after fine-tuning on three families of vision-and-language models.
Overall, we find that bias amplification in pretraining and after fine-tuning are independent.
arXiv Detail & Related papers (2023-10-26T16:19:19Z) - Evaluating the Fairness of Discriminative Foundation Models in Computer
Vision [51.176061115977774]
We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP)
We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy.
Specifically, we evaluate OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot classification, image retrieval and image captioning.
arXiv Detail & Related papers (2023-10-18T10:32:39Z) - Gender Biases in Automatic Evaluation Metrics for Image Captioning [87.15170977240643]
We conduct a systematic study of gender biases in model-based evaluation metrics for image captioning tasks.
We demonstrate the negative consequences of using these biased metrics, including the inability to differentiate between biased and unbiased generations.
We present a simple and effective way to mitigate the metric bias without hurting the correlations with human judgments.
arXiv Detail & Related papers (2023-05-24T04:27:40Z) - Social Biases in Automatic Evaluation Metrics for NLG [53.76118154594404]
We propose an evaluation method based on Word Embeddings Association Test (WEAT) and Sentence Embeddings Association Test (SEAT) to quantify social biases in evaluation metrics.
We construct gender-swapped meta-evaluation datasets to explore the potential impact of gender bias in image caption and text summarization tasks.
arXiv Detail & Related papers (2022-10-17T08:55:26Z) - Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem.
We examine the performance of various debiasing methods across multiple tasks.
We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z) - Cross-model Fairness: Empirical Study of Fairness and Ethics Under Model Multiplicity [10.144058870887061]
We argue that individuals can be harmed when one predictor is chosen ad hoc from a group of equally well performing models.
Our findings suggest that such unfairness can be readily found in real life and it may be difficult to mitigate by technical means alone.
arXiv Detail & Related papers (2022-03-14T14:33:39Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - LOGAN: Local Group Bias Detection by Clustering [86.38331353310114]
We argue that evaluating bias at the corpus level is not enough for understanding how biases are embedded in a model.
We propose LOGAN, a new bias detection technique based on clustering.
Experiments on toxicity classification and object classification tasks show that LOGAN identifies bias in a local region.
arXiv Detail & Related papers (2020-10-06T16:42:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.