Related papers: Persistent Anti-Muslim Bias in Large Language Models

Persistent Anti-Muslim Bias in Large Language Models

URL: http://arxiv.org/abs/2101.05783v2
Date: Mon, 18 Jan 2021 17:02:28 GMT
Title: Persistent Anti-Muslim Bias in Large Language Models
Authors: Abubakar Abid, Maheen Farooqi, James Zou
Abstract summary: GPT-3, a state-of-the-art contextual language model, captures persistent Muslim-violence bias. We probe GPT-3 in various ways, including prompt completion, analogical reasoning, and story generation. For instance, "Muslim" is analogized to "terrorist" in 23% of test cases, while "Jewish" is mapped to "money" in 5% of test cases.
Score: 13.984800635696566
License: http://creativecommons.org/licenses/by/4.0/
Abstract: It has been observed that large-scale language models capture undesirable societal biases, e.g. relating to race and gender; yet religious bias has been relatively unexplored. We demonstrate that GPT-3, a state-of-the-art contextual language model, captures persistent Muslim-violence bias. We probe GPT-3 in various ways, including prompt completion, analogical reasoning, and story generation, to understand this anti-Muslim bias, demonstrating that it appears consistently and creatively in different uses of the model and that it is severe even compared to biases about other religious groups. For instance, "Muslim" is analogized to "terrorist" in 23% of test cases, while "Jewish" is mapped to "money" in 5% of test cases. We quantify the positive distraction needed to overcome this bias with adversarial text prompts, and find that use of the most positive 6 adjectives reduces violent completions for "Muslims" from 66% to 20%, but which is still higher than for other religious groups.

Related papers

Fact-or-Fair: A Checklist for Behavioral Testing of AI Models on Fairness-Related Queries [85.909363478929]
In this study, we focus on 19 real-world statistics collected from authoritative sources. We develop a checklist comprising objective and subjective queries to analyze behavior of large language models. We propose metrics to assess factuality and fairness, and formally prove the inherent trade-off between these two aspects.
arXiv Detail & Related papers (2025-02-09T10:54:11Z)
Religious Bias Landscape in Language and Text-to-Image Models: Analysis, Detection, and Debiasing Strategies [16.177734242454193]
The widespread adoption of language models highlights the need for critical examinations of their inherent biases. This study systematically investigates religious bias in both language models and text-to-image generation models.
arXiv Detail & Related papers (2025-01-14T21:10:08Z)
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs) By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases. The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z)
Exploring Bengali Religious Dialect Biases in Large Language Models with Evaluation Perspectives [5.648318448953635]
Large Language Models (LLM) can produce output that contains stereotypes and biases. We explore bias from a religious perspective in Bengali, focusing specifically on two main religious dialects: Hindu and Muslim-majority dialects.
arXiv Detail & Related papers (2024-07-25T20:19:29Z)
White Men Lead, Black Women Help? Benchmarking and Mitigating Language Agency Social Biases in LLMs [58.27353205269664]
Social biases can manifest in language agency in Large Language Model (LLM)-generated content.<n>We introduce the Language Agency Bias Evaluation benchmark, which comprehensively evaluates biases in LLMs.<n>Using LABE, we unveil language agency social biases in 3 recent LLMs: ChatGPT, Llama3, and Mistral.
arXiv Detail & Related papers (2024-04-16T12:27:54Z)
What's in a Name? Auditing Large Language Models for Race and Gender Bias [49.28899492966893]
We employ an audit design to investigate biases in state-of-the-art large language models, including GPT-4. We find that the advice systematically disadvantages names that are commonly associated with racial minorities and women.
arXiv Detail & Related papers (2024-02-21T18:25:25Z)
What Do Llamas Really Think? Revealing Preference Biases in Language Model Representations [62.91799637259657]
Do large language models (LLMs) exhibit sociodemographic biases, even when they decline to respond? We study this research question by probing contextualized embeddings and exploring whether this bias is encoded in its latent representations. We propose a logistic Bradley-Terry probe which predicts word pair preferences of LLMs from the words' hidden vectors.
arXiv Detail & Related papers (2023-11-30T18:53:13Z)
Muslim-Violence Bias Persists in Debiased GPT Models [18.905135223612046]
Using common names associated with the religions in prompts increases several-fold the rate of violent completions. Our results show the need for continual de-biasing of models.
arXiv Detail & Related papers (2023-10-25T19:39:58Z)
Comparing Biases and the Impact of Multilingual Training across Multiple Languages [70.84047257764405]
We present a bias analysis across Italian, Chinese, English, Hebrew, and Spanish on the downstream sentiment analysis task. We adapt existing sentiment bias templates in English to Italian, Chinese, Hebrew, and Spanish for four attributes: race, religion, nationality, and gender. Our results reveal similarities in bias expression such as favoritism of groups that are dominant in each language's culture.
arXiv Detail & Related papers (2023-05-18T18:15:07Z)
PACO: Provocation Involving Action, Culture, and Oppression [13.70482307997736]
In India, people identify with a particular group based on certain attributes such as religion. The same religious groups are often provoked against each other. Previous studies show the role of provocation in increasing tensions between India's two prominent religious groups: Hindus and Muslims.
arXiv Detail & Related papers (2023-03-19T04:39:36Z)
Debiased Large Language Models Still Associate Muslims with Uniquely Violent Acts [24.633323508534254]
Using common names associated with the religions in prompts yields a highly significant increase in violent completions. Names of Muslim celebrities from non-violent domains resulted in relatively fewer violent completions. Our results show the need for additional debiasing of large language models to address higher-order schemas and associations.
arXiv Detail & Related papers (2022-08-08T20:59:16Z)
Intersectional Bias in Causal Language Models [0.0]
We examine emphGPT-2 and emphGPT-NEO models, ranging in size from 124 million to 2.7 billion parameters. We conduct an experiment combining up to three social categories - gender, religion and disability - into unconditional or zero-shot prompts. Our results confirm earlier tests conducted with auto-regressive causal models, including the emphGPT family of models.
arXiv Detail & Related papers (2021-07-16T03:46:08Z)
How True is GPT-2? An Empirical Analysis of Intersectional Occupational Biases [50.591267188664666]
Downstream applications are at risk of inheriting biases contained in natural language models. We analyze the occupational biases of a popular generative language model, GPT-2. For a given job, GPT-2 reflects the societal skew of gender and ethnicity in the US, and in some cases, pulls the distribution towards gender parity.
arXiv Detail & Related papers (2021-02-08T11:10:27Z)
Towards Controllable Biases in Language Generation [87.89632038677912]
We develop a method to induce societal biases in generated text when input prompts contain mentions of specific demographic groups. We analyze two scenarios: 1) inducing negative biases for one demographic and positive biases for another demographic, and 2) equalizing biases between demographics.
arXiv Detail & Related papers (2020-05-01T08:25:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.