Persistent Anti-Muslim Bias in Large Language Models
- URL: http://arxiv.org/abs/2101.05783v2
- Date: Mon, 18 Jan 2021 17:02:28 GMT
- Title: Persistent Anti-Muslim Bias in Large Language Models
- Authors: Abubakar Abid, Maheen Farooqi, James Zou
- Abstract summary: GPT-3, a state-of-the-art contextual language model, captures persistent Muslim-violence bias.
We probe GPT-3 in various ways, including prompt completion, analogical reasoning, and story generation.
For instance, "Muslim" is analogized to "terrorist" in 23% of test cases, while "Jewish" is mapped to "money" in 5% of test cases.
- Score: 13.984800635696566
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It has been observed that large-scale language models capture undesirable
societal biases, e.g. relating to race and gender; yet religious bias has been
relatively unexplored. We demonstrate that GPT-3, a state-of-the-art contextual
language model, captures persistent Muslim-violence bias. We probe GPT-3 in
various ways, including prompt completion, analogical reasoning, and story
generation, to understand this anti-Muslim bias, demonstrating that it appears
consistently and creatively in different uses of the model and that it is
severe even compared to biases about other religious groups. For instance,
"Muslim" is analogized to "terrorist" in 23% of test cases, while "Jewish" is
mapped to "money" in 5% of test cases. We quantify the positive distraction
needed to overcome this bias with adversarial text prompts, and find that use
of the most positive 6 adjectives reduces violent completions for "Muslims"
from 66% to 20%, but which is still higher than for other religious groups.
Related papers
- What's in a Name? Auditing Large Language Models for Race and Gender
Bias [49.28899492966893]
We employ an audit design to investigate biases in state-of-the-art large language models, including GPT-4.
We find that the advice systematically disadvantages names that are commonly associated with racial minorities and women.
arXiv Detail & Related papers (2024-02-21T18:25:25Z) - What Do Llamas Really Think? Revealing Preference Biases in Language
Model Representations [62.91799637259657]
Do large language models (LLMs) exhibit sociodemographic biases, even when they decline to respond?
We study this research question by probing contextualized embeddings and exploring whether this bias is encoded in its latent representations.
We propose a logistic Bradley-Terry probe which predicts word pair preferences of LLMs from the words' hidden vectors.
arXiv Detail & Related papers (2023-11-30T18:53:13Z) - Muslim-Violence Bias Persists in Debiased GPT Models [18.905135223612046]
Using common names associated with the religions in prompts increases several-fold the rate of violent completions.
Our results show the need for continual de-biasing of models.
arXiv Detail & Related papers (2023-10-25T19:39:58Z) - Casteist but Not Racist? Quantifying Disparities in Large Language Model
Bias between India and the West [19.286414041202818]
Large Language Models (LLMs) can encode societal biases, exposing their users to representational harms.
We quantify stereotypical bias in popular LLMs according to an Indian-centric frame and compare bias levels between the Indian and Western contexts.
We find that the majority of LLMs tested are strongly biased towards stereotypes in the Indian context, especially as compared to the Western context.
arXiv Detail & Related papers (2023-09-15T17:38:41Z) - Comparing Biases and the Impact of Multilingual Training across Multiple
Languages [70.84047257764405]
We present a bias analysis across Italian, Chinese, English, Hebrew, and Spanish on the downstream sentiment analysis task.
We adapt existing sentiment bias templates in English to Italian, Chinese, Hebrew, and Spanish for four attributes: race, religion, nationality, and gender.
Our results reveal similarities in bias expression such as favoritism of groups that are dominant in each language's culture.
arXiv Detail & Related papers (2023-05-18T18:15:07Z) - PACO: Provocation Involving Action, Culture, and Oppression [13.70482307997736]
In India, people identify with a particular group based on certain attributes such as religion.
The same religious groups are often provoked against each other.
Previous studies show the role of provocation in increasing tensions between India's two prominent religious groups: Hindus and Muslims.
arXiv Detail & Related papers (2023-03-19T04:39:36Z) - Debiased Large Language Models Still Associate Muslims with Uniquely
Violent Acts [24.633323508534254]
Using common names associated with the religions in prompts yields a highly significant increase in violent completions.
Names of Muslim celebrities from non-violent domains resulted in relatively fewer violent completions.
Our results show the need for additional debiasing of large language models to address higher-order schemas and associations.
arXiv Detail & Related papers (2022-08-08T20:59:16Z) - Intersectional Bias in Causal Language Models [0.0]
We examine emphGPT-2 and emphGPT-NEO models, ranging in size from 124 million to 2.7 billion parameters.
We conduct an experiment combining up to three social categories - gender, religion and disability - into unconditional or zero-shot prompts.
Our results confirm earlier tests conducted with auto-regressive causal models, including the emphGPT family of models.
arXiv Detail & Related papers (2021-07-16T03:46:08Z) - How True is GPT-2? An Empirical Analysis of Intersectional Occupational
Biases [50.591267188664666]
Downstream applications are at risk of inheriting biases contained in natural language models.
We analyze the occupational biases of a popular generative language model, GPT-2.
For a given job, GPT-2 reflects the societal skew of gender and ethnicity in the US, and in some cases, pulls the distribution towards gender parity.
arXiv Detail & Related papers (2021-02-08T11:10:27Z) - UnQovering Stereotyping Biases via Underspecified Questions [68.81749777034409]
We present UNQOVER, a framework to probe and quantify biases through underspecified questions.
We show that a naive use of model scores can lead to incorrect bias estimates due to two forms of reasoning errors.
We use this metric to analyze four important classes of stereotypes: gender, nationality, ethnicity, and religion.
arXiv Detail & Related papers (2020-10-06T01:49:52Z) - Towards Controllable Biases in Language Generation [87.89632038677912]
We develop a method to induce societal biases in generated text when input prompts contain mentions of specific demographic groups.
We analyze two scenarios: 1) inducing negative biases for one demographic and positive biases for another demographic, and 2) equalizing biases between demographics.
arXiv Detail & Related papers (2020-05-01T08:25:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.