Evaluating Gender Bias in Large Language Models
- URL: http://arxiv.org/abs/2411.09826v1
- Date: Thu, 14 Nov 2024 22:23:13 GMT
- Title: Evaluating Gender Bias in Large Language Models
- Authors: Michael Döll, Markus Döhring, Andreas Müller,
- Abstract summary: The study examines the extent to which Large Language Models (LLMs) exhibit gender bias in pronoun selection in occupational contexts.
The jobs considered include a range of occupations, from those with a significant male presence to those with a notable female concentration.
The results show a positive correlation between the models' pronoun choices and the gender distribution present in U.S. labor force data.
- Score: 0.8636148452563583
- License:
- Abstract: Gender bias in artificial intelligence has become an important issue, particularly in the context of language models used in communication-oriented applications. This study examines the extent to which Large Language Models (LLMs) exhibit gender bias in pronoun selection in occupational contexts. The analysis evaluates the models GPT-4, GPT-4o, PaLM 2 Text Bison and Gemini 1.0 Pro using a self-generated dataset. The jobs considered include a range of occupations, from those with a significant male presence to those with a notable female concentration, as well as jobs with a relatively equal gender distribution. Three different sentence processing methods were used to assess potential gender bias: masked tokens, unmasked sentences, and sentence completion. In addition, the LLMs suggested names of individuals in specific occupations, which were then examined for gender distribution. The results show a positive correlation between the models' pronoun choices and the gender distribution present in U.S. labor force data. Female pronouns were more often associated with female-dominated occupations, while male pronouns were more often associated with male-dominated occupations. Sentence completion showed the strongest correlation with actual gender distribution, while name generation resulted in a more balanced 'politically correct' gender distribution, albeit with notable variations in predominantly male or female occupations. Overall, the prompting method had a greater impact on gender distribution than the model selection itself, highlighting the complexity of addressing gender bias in LLMs. The findings highlight the importance of prompting in gender mapping.
Related papers
- Gender Bias in Decision-Making with Large Language Models: A Study of Relationship Conflicts [15.676219253088211]
We study gender equity within large language models (LLMs) through a decision-making lens.
We explore nine relationship configurations through name pairs across three name lists (men, women, neutral)
arXiv Detail & Related papers (2024-10-14T20:50:11Z) - Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders.
This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words)
We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z) - GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models.
Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs.
Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z) - Leveraging Large Language Models to Measure Gender Representation Bias in Gendered Language Corpora [9.959039325564744]
Gender bias in text corpora can lead to perpetuation and amplification of societal inequalities.
Existing methods to measure gender representation bias in text corpora have mainly been proposed for English.
This paper introduces a novel methodology to quantitatively measure gender representation bias in Spanish corpora.
arXiv Detail & Related papers (2024-06-19T16:30:58Z) - Gender Bias in Large Language Models across Multiple Languages [10.068466432117113]
We examine gender bias in large language models (LLMs) generated for different languages.
We use three measurements: 1) gender bias in selecting descriptive words given the gender-related context.
2) gender bias in selecting gender-related pronouns (she/he) given the descriptive words.
arXiv Detail & Related papers (2024-03-01T04:47:16Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z) - Towards Understanding Gender-Seniority Compound Bias in Natural Language
Generation [64.65911758042914]
We investigate how seniority impacts the degree of gender bias exhibited in pretrained neural generation models.
Our results show that GPT-2 amplifies bias by considering women as junior and men as senior more often than the ground truth in both domains.
These results suggest that NLP applications built using GPT-2 may harm women in professional capacities.
arXiv Detail & Related papers (2022-05-19T20:05:02Z) - Collecting a Large-Scale Gender Bias Dataset for Coreference Resolution
and Machine Translation [10.542861450223128]
We find grammatical patterns indicating stereotypical and non-stereotypical gender-role assignments in corpora from three domains.
We manually verify the quality of our corpus and use it to evaluate gender bias in various coreference resolution and machine translation models.
arXiv Detail & Related papers (2021-09-08T18:14:11Z) - How True is GPT-2? An Empirical Analysis of Intersectional Occupational
Biases [50.591267188664666]
Downstream applications are at risk of inheriting biases contained in natural language models.
We analyze the occupational biases of a popular generative language model, GPT-2.
For a given job, GPT-2 reflects the societal skew of gender and ethnicity in the US, and in some cases, pulls the distribution towards gender parity.
arXiv Detail & Related papers (2021-02-08T11:10:27Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.