A database to support the evaluation of gender biases in GPT-4o output
- URL: http://arxiv.org/abs/2502.20898v1
- Date: Fri, 28 Feb 2025 09:54:13 GMT
- Title: A database to support the evaluation of gender biases in GPT-4o output
- Authors: Luise Mehner, Lena Alicija Philine Fiedler, Sabine Ammon, Dorothea Kolossa,
- Abstract summary: A prominent ethical risk of Large Language Models (LLMs) is the generation of unfair language output.<n>We propose a novel approach to database construction to assess gender-related biases.
- Score: 4.517392236571035
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The widespread application of Large Language Models (LLMs) involves ethical risks for users and societies. A prominent ethical risk of LLMs is the generation of unfair language output that reinforces or exacerbates harm for members of disadvantaged social groups through gender biases (Weidinger et al., 2022; Bender et al., 2021; Kotek et al., 2023). Hence, the evaluation of the fairness of LLM outputs with respect to such biases is a topic of rising interest. To advance research in this field, promote discourse on suitable normative bases and evaluation methodologies, and enhance the reproducibility of related studies, we propose a novel approach to database construction. This approach enables the assessment of gender-related biases in LLM-generated language beyond merely evaluating their degree of neutralization.
Related papers
- Bridging the Fairness Gap: Enhancing Pre-trained Models with LLM-Generated Sentences [8.979854959662664]
We propose enhancing fairness (Fair-Gender) in pre-trained language models (PLMs) by absorbing coherent, attribute-balanced, and semantically rich sentences.
These sentences cannot be directly used for debiasing due to alignment issues and the risk of negative transfer.
We address this by applying causal analysis to estimate causal effects, filtering out unaligned sentences, and identifying aligned ones for incorporation into PLMs.
arXiv Detail & Related papers (2025-01-12T12:32:43Z) - Bias in Large Language Models: Origin, Evaluation, and Mitigation [4.606140332500086]
Large Language Models (LLMs) have revolutionized natural language processing, but their susceptibility to biases poses significant challenges.
This comprehensive review examines the landscape of bias in LLMs, from its origins to current mitigation strategies.
Ethical and legal implications of biased LLMs are discussed, emphasizing potential harms in real-world applications such as healthcare and criminal justice.
arXiv Detail & Related papers (2024-11-16T23:54:53Z) - The Root Shapes the Fruit: On the Persistence of Gender-Exclusive Harms in Aligned Language Models [58.130894823145205]
We center transgender, nonbinary, and other gender-diverse identities to investigate how alignment procedures interact with pre-existing gender-diverse bias.
Our findings reveal that DPO-aligned models are particularly sensitive to supervised finetuning.
We conclude with recommendations tailored to DPO and broader alignment practices.
arXiv Detail & Related papers (2024-11-06T06:50:50Z) - GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models [73.23743278545321]
Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but have also been observed to magnify societal biases.<n>GenderCARE is a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics.
arXiv Detail & Related papers (2024-08-22T15:35:46Z) - Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective [66.34066553400108]
We conduct a rigorous evaluation of large language models' implicit bias towards certain demographics.<n>Inspired by psychometric principles, we propose three attack approaches, i.e., Disguise, Deception, and Teaching.<n>Our methods can elicit LLMs' inner bias more effectively than competitive baselines.
arXiv Detail & Related papers (2024-06-20T06:42:08Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Unveiling Gender Bias in Terms of Profession Across LLMs: Analyzing and
Addressing Sociological Implications [0.0]
The study examines existing research on gender bias in AI language models and identifies gaps in the current knowledge.
The findings shed light on gendered word associations, language usage, and biased narratives present in the outputs of Large Language Models.
The paper presents strategies for reducing gender bias in LLMs, including algorithmic approaches and data augmentation techniques.
arXiv Detail & Related papers (2023-07-18T11:38:45Z) - Social Biases in Automatic Evaluation Metrics for NLG [53.76118154594404]
We propose an evaluation method based on Word Embeddings Association Test (WEAT) and Sentence Embeddings Association Test (SEAT) to quantify social biases in evaluation metrics.
We construct gender-swapped meta-evaluation datasets to explore the potential impact of gender bias in image caption and text summarization tasks.
arXiv Detail & Related papers (2022-10-17T08:55:26Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.