Related papers: Who is better at math, Jenny or Jingzhen? Uncovering Stereotypes in Large Language Models

Who is better at math, Jenny or Jingzhen? Uncovering Stereotypes in Large Language Models

URL: http://arxiv.org/abs/2407.06917v3
Date: Wed, 9 Oct 2024 11:17:46 GMT
Title: Who is better at math, Jenny or Jingzhen? Uncovering Stereotypes in Large Language Models
Authors: Zara Siddique, Liam D. Turner, Luis Espinosa-Anke,
Abstract summary: We use GlobalBias to study a broad set of stereotypes from around the world. We generate character profiles based on given names and evaluate the prevalence of stereotypes in model outputs.
Score: 9.734705470760511
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have been shown to propagate and amplify harmful stereotypes, particularly those that disproportionately affect marginalised communities. To understand the effect of these stereotypes more comprehensively, we introduce GlobalBias, a dataset of 876k sentences incorporating 40 distinct gender-by-ethnicity groups alongside descriptors typically used in bias literature, which enables us to study a broad set of stereotypes from around the world. We use GlobalBias to directly probe a suite of LMs via perplexity, which we use as a proxy to determine how certain stereotypes are represented in the model's internal representations. Following this, we generate character profiles based on given names and evaluate the prevalence of stereotypes in model outputs. We find that the demographic groups associated with various stereotypes remain consistent across model likelihoods and model outputs. Furthermore, larger models consistently display higher levels of stereotypical outputs, even when explicitly instructed not to.

Related papers

Surface Fairness, Deep Bias: A Comparative Study of Bias in Language Models [49.41113560646115]
We investigate various proxy measures of bias in large language models (LLMs)<n>We find that evaluating models with pre-prompted personae on a multi-subject benchmark (MMLU) leads to negligible and mostly random differences in scores.<n>With the recent trend for LLM assistant memory and personalization, these problems open up from a different angle.
arXiv Detail & Related papers (2025-06-12T08:47:40Z)
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs) By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases. The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z)
Stereotype Detection in LLMs: A Multiclass, Explainable, and Benchmark-Driven Approach [4.908389661988191]
This paper introduces the Multi-Grain Stereotype (MGS) dataset, consisting of 51,867 instances across gender, race, profession, religion, and other stereotypes. We evaluate various machine learning approaches to establish baselines and fine-tune language models of different architectures and sizes. We employ explainable AI (XAI) tools, including SHAP, LIME, and BertViz, to assess whether the model's learned patterns align with human intuitions about stereotypes.
arXiv Detail & Related papers (2024-04-02T09:31:32Z)
Self-Debiasing Large Language Models: Zero-Shot Recognition and Reduction of Stereotypes [73.12947922129261]
We leverage the zero-shot capabilities of large language models to reduce stereotyping. We show that self-debiasing can significantly reduce the degree of stereotyping across nine different social groups. We hope this work opens inquiry into other zero-shot techniques for bias mitigation.
arXiv Detail & Related papers (2024-02-03T01:40:11Z)
Will the Prince Get True Love's Kiss? On the Model Sensitivity to Gender Perturbation over Fairytale Texts [80.21033860436081]
We investigate how models respond to gender stereotype perturbations through counterfactual data augmentation. Our results show that models exhibit slight performance drops when faced with gender perturbations in the test set. When fine-tuned on counterfactual training data, models become more robust to anti-stereotypical narratives.
arXiv Detail & Related papers (2023-10-16T22:25:09Z)
SeeGULL: A Stereotype Benchmark with Broad Geo-Cultural Coverage Leveraging Generative Models [15.145145928670827]
SeeGULL is a broad-coverage stereotype dataset in English. It contains stereotypes about identity groups spanning 178 countries across 8 different geo-political regions across 6 continents. We also include fine-grained offensiveness scores for different stereotypes and demonstrate their global disparities.
arXiv Detail & Related papers (2023-05-19T17:30:19Z)
Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale [61.555788332182395]
We investigate the potential for machine learning models to amplify dangerous and complex stereotypes. We find a broad range of ordinary prompts produce stereotypes, including prompts simply mentioning traits, descriptors, occupations, or objects.
arXiv Detail & Related papers (2022-11-07T18:31:07Z)
Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases. We propose steps towards mitigating social biases during text generation. Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z)
How True is GPT-2? An Empirical Analysis of Intersectional Occupational Biases [50.591267188664666]
Downstream applications are at risk of inheriting biases contained in natural language models. We analyze the occupational biases of a popular generative language model, GPT-2. For a given job, GPT-2 reflects the societal skew of gender and ethnicity in the US, and in some cases, pulls the distribution towards gender parity.
arXiv Detail & Related papers (2021-02-08T11:10:27Z)
UnQovering Stereotyping Biases via Underspecified Questions [68.81749777034409]
We present UNQOVER, a framework to probe and quantify biases through underspecified questions. We show that a naive use of model scores can lead to incorrect bias estimates due to two forms of reasoning errors. We use this metric to analyze four important classes of stereotypes: gender, nationality, ethnicity, and religion.
arXiv Detail & Related papers (2020-10-06T01:49:52Z)
CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models [30.582132471411263]
We introduce the Crowd Stereotype Pairs benchmark (CrowS-Pairs) CrowS-Pairs has 1508 examples that cover stereotypes dealing with nine types of bias, like race, religion, and age. We find that all three of the widely-used sentences we evaluate substantially favor stereotypes in every category in CrowS-Pairs.
arXiv Detail & Related papers (2020-09-30T22:38:40Z)
StereoSet: Measuring stereotypical bias in pretrained language models [24.020149562072127]
We present StereoSet, a large-scale natural dataset in English to measure stereotypical biases in four domains. We evaluate popular models like BERT, GPT-2, RoBERTa, and XLNet on our dataset and show that these models exhibit strong stereotypical biases.
arXiv Detail & Related papers (2020-04-20T17:14:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.