FairPy: A Toolkit for Evaluation of Social Biases and their Mitigation
in Large Language Models
- URL: http://arxiv.org/abs/2302.05508v1
- Date: Fri, 10 Feb 2023 20:54:10 GMT
- Title: FairPy: A Toolkit for Evaluation of Social Biases and their Mitigation
in Large Language Models
- Authors: Hrishikesh Viswanath and Tianyi Zhang
- Abstract summary: Studies have shown that large pretrained language models exhibit biases against social groups based on race, gender etc.
Various researchers have proposed mathematical tools for quantifying and identifying these biases.
We present a comprehensive quantitative evaluation of different kinds of biases such as race, gender, ethnicity, age etc.
- Score: 7.250074804839615
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Studies have shown that large pretrained language models exhibit biases
against social groups based on race, gender etc, which they inherit from the
datasets they are trained on. Various researchers have proposed mathematical
tools for quantifying and identifying these biases. There have been methods
proposed to mitigate such biases. In this paper, we present a comprehensive
quantitative evaluation of different kinds of biases such as race, gender,
ethnicity, age etc. exhibited by popular pretrained language models such as
BERT, GPT-2 etc. and also present a toolkit that provides plug-and-play
interfaces to connect mathematical tools to identify biases with large
pretrained language models such as BERT, GPT-2 etc. and also present users with
the opportunity to test custom models against these metrics. The toolkit also
allows users to debias existing and custom models using the debiasing
techniques proposed so far. The toolkit is available at
https://github.com/HrishikeshVish/Fairpy.
Related papers
- Quantifying Bias in Text-to-Image Generative Models [49.60774626839712]
Bias in text-to-image (T2I) models can propagate unfair social representations and may be used to aggressively market ideas or push controversial agendas.
Existing T2I model bias evaluation methods only focus on social biases.
We propose an evaluation methodology to quantify general biases in T2I generative models, without any preconceived notions.
arXiv Detail & Related papers (2023-12-20T14:26:54Z) - CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI
Collaboration for Large Language Models [52.25049362267279]
We present a Chinese Bias Benchmark dataset that consists of over 100K questions jointly constructed by human experts and generative language models.
The testing instances in the dataset are automatically derived from 3K+ high-quality templates manually authored with stringent quality control.
Extensive experiments demonstrate the effectiveness of the dataset in detecting model bias, with all 10 publicly available Chinese large language models exhibiting strong bias in certain categories.
arXiv Detail & Related papers (2023-06-28T14:14:44Z) - Exposing Bias in Online Communities through Large-Scale Language Models [3.04585143845864]
This work uses the flaw of bias in language models to explore the biases of six different online communities.
The bias of the resulting models is evaluated by prompting the models with different demographics and comparing the sentiment and toxicity values of these generations.
This work not only affirms how easily bias is absorbed from training data but also presents a scalable method to identify and compare the bias of different datasets or communities.
arXiv Detail & Related papers (2023-06-04T08:09:26Z) - Gender Biases in Automatic Evaluation Metrics for Image Captioning [87.15170977240643]
We conduct a systematic study of gender biases in model-based evaluation metrics for image captioning tasks.
We demonstrate the negative consequences of using these biased metrics, including the inability to differentiate between biased and unbiased generations.
We present a simple and effective way to mitigate the metric bias without hurting the correlations with human judgments.
arXiv Detail & Related papers (2023-05-24T04:27:40Z) - FineDeb: A Debiasing Framework for Language Models [3.7698299781999376]
We propose FineDeb, a two-phase debiasing framework for language models.
Our results show that FineDeb offers stronger debiasing in comparison to other methods.
Our framework is generalizable for demographics with multiple classes.
arXiv Detail & Related papers (2023-02-05T18:35:21Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - An Analysis of Social Biases Present in BERT Variants Across Multiple
Languages [0.0]
We investigate the bias present in monolingual BERT models across a diverse set of languages.
We propose a template-based method to measure any kind of bias, based on sentence pseudo-likelihood.
We conclude that current methods of probing for bias are highly language-dependent.
arXiv Detail & Related papers (2022-11-25T23:38:08Z) - MRCLens: an MRC Dataset Bias Detection Toolkit [82.44296974850639]
We introduce MRCLens, a toolkit that detects whether biases exist before users train the full model.
For the convenience of introducing the toolkit, we also provide a categorization of common biases in MRC.
arXiv Detail & Related papers (2022-07-18T21:05:39Z) - Challenges in Measuring Bias via Open-Ended Language Generation [1.5552869983952944]
We analyze how specific choices of prompt sets, metrics, automatic tools and sampling strategies affect bias results.
We provide recommendations for reporting biases in open-ended language generation.
arXiv Detail & Related papers (2022-05-23T19:57:15Z) - "I'm sorry to hear that": Finding New Biases in Language Models with a
Holistic Descriptor Dataset [12.000335510088648]
We present a new, more inclusive bias measurement dataset, HolisticBias, which includes nearly 600 descriptor terms across 13 different demographic axes.
HolisticBias was assembled in a participatory process including experts and community members with lived experience of these terms.
We demonstrate that HolisticBias is effective at measuring previously undetectable biases in token likelihoods from language models.
arXiv Detail & Related papers (2022-05-18T20:37:25Z) - REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets [64.76453161039973]
REVISE (REvealing VIsual biaSEs) is a tool that assists in the investigation of a visual dataset.
It surfacing potential biases along three dimensions: (1) object-based, (2) person-based, and (3) geography-based.
arXiv Detail & Related papers (2020-04-16T23:54:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.