Related papers: Intersectional Bias in Causal Language Models

Intersectional Bias in Causal Language Models

URL: http://arxiv.org/abs/2107.07691v1
Date: Fri, 16 Jul 2021 03:46:08 GMT
Title: Intersectional Bias in Causal Language Models
Authors: Liam Magee, Lida Ghahremanlou, Karen Soldatic, and Shanthi Robertson
Abstract summary: We examine emphGPT-2 and emphGPT-NEO models, ranging in size from 124 million to 2.7 billion parameters. We conduct an experiment combining up to three social categories - gender, religion and disability - into unconditional or zero-shot prompts. Our results confirm earlier tests conducted with auto-regressive causal models, including the emphGPT family of models.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: To examine whether intersectional bias can be observed in language generation, we examine \emph{GPT-2} and \emph{GPT-NEO} models, ranging in size from 124 million to ~2.7 billion parameters. We conduct an experiment combining up to three social categories - gender, religion and disability - into unconditional or zero-shot prompts used to generate sentences that are then analysed for sentiment. Our results confirm earlier tests conducted with auto-regressive causal models, including the \emph{GPT} family of models. We also illustrate why bias may be resistant to techniques that target single categories (e.g. gender, religion and race), as it can also manifest, in often subtle ways, in texts prompted by concatenated social categories. To address these difficulties, we suggest technical and community-based approaches need to combine to acknowledge and address complex and intersectional language model bias.

Related papers

Mitigate One, Skew Another? Tackling Intersectional Biases in Text-to-Image Models [73.20190633746442]
We introduce BiasConnect, a novel tool for analyzing and quantifying bias interactions in text-to-image models.<n>We propose InterMit, an intersectional bias mitigation algorithm guided by user-defined target distributions and priority weights.
arXiv Detail & Related papers (2025-05-22T20:56:38Z)
Large Language Models Still Exhibit Bias in Long Text [14.338308312117901]
We introduce the Long Text Fairness Test (LTF-TEST), a framework that evaluates biases in large language models. By assessing both model responses and the reasoning behind them, LTF-TEST uncovers subtle biases that are difficult to detect in simple responses. We propose FT-REGARD, a finetuning approach that pairs biased prompts with neutral responses.
arXiv Detail & Related papers (2024-10-23T02:51:33Z)
The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models [78.69526166193236]
Pre-trained Language models (PLMs) have been acknowledged to contain harmful information, such as social biases. We propose sc Social Bias Neurons to accurately pinpoint units (i.e., neurons) in a language model that can be attributed to undesirable behavior, such as social bias. As measured by prior metrics from StereoSet, our model achieves a higher degree of fairness while maintaining language modeling ability with low cost.
arXiv Detail & Related papers (2024-06-14T15:41:06Z)
Self-Debiasing Large Language Models: Zero-Shot Recognition and Reduction of Stereotypes [73.12947922129261]
We leverage the zero-shot capabilities of large language models to reduce stereotyping. We show that self-debiasing can significantly reduce the degree of stereotyping across nine different social groups. We hope this work opens inquiry into other zero-shot techniques for bias mitigation.
arXiv Detail & Related papers (2024-02-03T01:40:11Z)
SocialStigmaQA: A Benchmark to Uncover Stigma Amplification in Generative Language Models [8.211129045180636]
We introduce a benchmark meant to capture the amplification of social bias, via stigmas, in generative language models. Our benchmark, SocialStigmaQA, contains roughly 10K prompts, with a variety of prompt styles, carefully constructed to test for both social bias and model robustness. We find that the proportion of socially biased output ranges from 45% to 59% across a variety of decoding strategies and prompting styles.
arXiv Detail & Related papers (2023-12-12T18:27:44Z)
Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs) We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing. We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z)
Evaluating Biased Attitude Associations of Language Models in an Intersectional Context [2.891314299138311]
Language models are trained on large-scale corpora that embed implicit biases documented in psychology. We study biases related to age, education, gender, height, intelligence, literacy, race, religion, sex, sexual orientation, social class, and weight. We find that language models exhibit the most biased attitudes against gender identity, social class, and sexual orientation signals in language.
arXiv Detail & Related papers (2023-07-07T03:01:56Z)
CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models [52.25049362267279]
We present a Chinese Bias Benchmark dataset that consists of over 100K questions jointly constructed by human experts and generative language models. The testing instances in the dataset are automatically derived from 3K+ high-quality templates manually authored with stringent quality control. Extensive experiments demonstrate the effectiveness of the dataset in detecting model bias, with all 10 publicly available Chinese large language models exhibiting strong bias in certain categories.
arXiv Detail & Related papers (2023-06-28T14:14:44Z)
An Analysis of Social Biases Present in BERT Variants Across Multiple Languages [0.0]
We investigate the bias present in monolingual BERT models across a diverse set of languages. We propose a template-based method to measure any kind of bias, based on sentence pseudo-likelihood. We conclude that current methods of probing for bias are highly language-dependent.
arXiv Detail & Related papers (2022-11-25T23:38:08Z)
How True is GPT-2? An Empirical Analysis of Intersectional Occupational Biases [50.591267188664666]
Downstream applications are at risk of inheriting biases contained in natural language models. We analyze the occupational biases of a popular generative language model, GPT-2. For a given job, GPT-2 reflects the societal skew of gender and ethnicity in the US, and in some cases, pulls the distribution towards gender parity.
arXiv Detail & Related papers (2021-02-08T11:10:27Z)
Towards Controllable Biases in Language Generation [87.89632038677912]
We develop a method to induce societal biases in generated text when input prompts contain mentions of specific demographic groups. We analyze two scenarios: 1) inducing negative biases for one demographic and positive biases for another demographic, and 2) equalizing biases between demographics.
arXiv Detail & Related papers (2020-05-01T08:25:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.