Protected group bias and stereotypes in Large Language Models
- URL: http://arxiv.org/abs/2403.14727v1
- Date: Thu, 21 Mar 2024 00:21:38 GMT
- Title: Protected group bias and stereotypes in Large Language Models
- Authors: Hadas Kotek, David Q. Sun, Zidi Xiu, Margit Bowler, Christopher Klein,
- Abstract summary: This paper investigates the behavior of Large Language Models (LLMs) in the domains of ethics and fairness.
We find bias across minoritized groups, but in particular in the domains of gender and sexuality, as well as Western bias.
- Score: 2.1122940074160357
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As modern Large Language Models (LLMs) shatter many state-of-the-art benchmarks in a variety of domains, this paper investigates their behavior in the domains of ethics and fairness, focusing on protected group bias. We conduct a two-part study: first, we solicit sentence continuations describing the occupations of individuals from different protected groups, including gender, sexuality, religion, and race. Second, we have the model generate stories about individuals who hold different types of occupations. We collect >10k sentence completions made by a publicly available LLM, which we subject to human annotation. We find bias across minoritized groups, but in particular in the domains of gender and sexuality, as well as Western bias, in model generations. The model not only reflects societal biases, but appears to amplify them. The model is additionally overly cautious in replies to queries relating to minoritized groups, providing responses that strongly emphasize diversity and equity to an extent that other group characteristics are overshadowed. This suggests that artificially constraining potentially harmful outputs may itself lead to harm, and should be applied in a careful and controlled manner.
Related papers
- The Root Shapes the Fruit: On the Persistence of Gender-Exclusive Harms in Aligned Language Models [58.130894823145205]
We center transgender, nonbinary, and other gender-diverse identities to investigate how alignment procedures interact with pre-existing gender-diverse bias.
Our findings reveal that DPO-aligned models are particularly sensitive to supervised finetuning.
We conclude with recommendations tailored to DPO and broader alignment practices.
arXiv Detail & Related papers (2024-11-06T06:50:50Z) - Persona Setting Pitfall: Persistent Outgroup Biases in Large Language Models Arising from Social Identity Adoption [10.35915254696156]
We show that outgroup bias manifests as strongly as ingroup favoritism.
Our findings highlight the potential to develop more equitable and balanced language models.
arXiv Detail & Related papers (2024-09-05T18:08:47Z) - Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs)
By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases.
The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z) - GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models.
Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs.
Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z) - Evaluating Large Language Models through Gender and Racial Stereotypes [0.0]
We conduct a quality comparative study and establish a framework to evaluate language models under the premise of two kinds of biases: gender and race.
We find out that while gender bias has reduced immensely in newer models, as compared to older ones, racial bias still exists.
arXiv Detail & Related papers (2023-11-24T18:41:16Z) - On the steerability of large language models toward data-driven personas [98.9138902560793]
Large language models (LLMs) are known to generate biased responses where the opinions of certain groups and populations are underrepresented.
Here, we present a novel approach to achieve controllable generation of specific viewpoints using LLMs.
arXiv Detail & Related papers (2023-11-08T19:01:13Z) - Fairness in AI Systems: Mitigating gender bias from language-vision
models [0.913755431537592]
We study the extent of the impact of gender bias in existing datasets.
We propose a methodology to mitigate its impact in caption based language vision models.
arXiv Detail & Related papers (2023-05-03T04:33:44Z) - MultiModal Bias: Introducing a Framework for Stereotypical Bias
Assessment beyond Gender and Race in Vision Language Models [40.12132844347926]
We provide a visual and textual bias benchmark called MMBias, consisting of around 3,800 images and phrases covering 14 population subgroups.
We utilize this dataset to assess bias in several prominent self supervised multimodal models, including CLIP, ALBEF, and ViLT.
We introduce a debiasing method designed specifically for such large pre-trained models that can be applied as a post-processing step to mitigate bias.
arXiv Detail & Related papers (2023-03-16T17:36:37Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z) - How True is GPT-2? An Empirical Analysis of Intersectional Occupational
Biases [50.591267188664666]
Downstream applications are at risk of inheriting biases contained in natural language models.
We analyze the occupational biases of a popular generative language model, GPT-2.
For a given job, GPT-2 reflects the societal skew of gender and ethnicity in the US, and in some cases, pulls the distribution towards gender parity.
arXiv Detail & Related papers (2021-02-08T11:10:27Z) - CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked
Language Models [30.582132471411263]
We introduce the Crowd Stereotype Pairs benchmark (CrowS-Pairs)
CrowS-Pairs has 1508 examples that cover stereotypes dealing with nine types of bias, like race, religion, and age.
We find that all three of the widely-used sentences we evaluate substantially favor stereotypes in every category in CrowS-Pairs.
arXiv Detail & Related papers (2020-09-30T22:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.