Related papers: From Structured Prompts to Open Narratives: Measuring Gender Bias in LLMs Through Open-Ended Storytelling

From Structured Prompts to Open Narratives: Measuring Gender Bias in LLMs Through Open-Ended Storytelling

URL: http://arxiv.org/abs/2503.15904v1
Date: Thu, 20 Mar 2025 07:15:45 GMT
Title: From Structured Prompts to Open Narratives: Measuring Gender Bias in LLMs Through Open-Ended Storytelling
Authors: Evan Chen, Run-Jun Zhan, Yan-Bai Lin, Hung-Hsuan Chen,
Abstract summary: Large Language Models (LLMs) have revolutionized natural language processing, yet concerns persist regarding their tendency to reflect or amplify social biases.<n>This study introduces a novel evaluation framework to uncover gender biases in LLMs, focusing on their occupational narratives.
Score: 2.4374097382908477
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have revolutionized natural language processing, yet concerns persist regarding their tendency to reflect or amplify social biases present in their training data. This study introduces a novel evaluation framework to uncover gender biases in LLMs, focusing on their occupational narratives. Unlike previous methods relying on structured scenarios or carefully crafted prompts, our approach leverages free-form storytelling to reveal biases embedded in the models. Systematic analyses show an overrepresentation of female characters across occupations in six widely used LLMs. Additionally, our findings reveal that LLM-generated occupational gender rankings align more closely with human stereotypes than actual labor statistics. These insights underscore the need for balanced mitigation strategies to ensure fairness while avoiding the reinforcement of new stereotypes.

Related papers

The LLM Wears Prada: Analysing Gender Bias and Stereotypes through Online Shopping Data [8.26034886618475]
We investigate whether Large Language Models can predict an individual's gender based solely on online shopping histories. Using a dataset of historical online purchases from users in the United States, we evaluate the ability of six LLMs to classify gender. Results indicate that while models can infer gender with moderate accuracy, their decisions are often rooted in stereotypical associations between product categories and gender.
arXiv Detail & Related papers (2025-04-02T17:56:08Z)
Profiling Bias in LLMs: Stereotype Dimensions in Contextual Word Embeddings [1.5379084885764847]
Large language models (LLMs) are the foundation of the current successes of artificial intelligence (AI)<n>To effectively communicate the risks and encourage mitigation efforts these models need adequate and intuitive descriptions of their discriminatory properties.<n>We suggest bias profiles with respect to stereotype dimensions based on dictionaries from social psychology research.
arXiv Detail & Related papers (2024-11-25T16:14:45Z)
White Men Lead, Black Women Help? Benchmarking and Mitigating Language Agency Social Biases in LLMs [58.27353205269664]
Social biases can manifest in language agency in Large Language Model (LLM)-generated content.<n>We introduce the Language Agency Bias Evaluation benchmark, which comprehensively evaluates biases in LLMs.<n>Using LABE, we unveil language agency social biases in 3 recent LLMs: ChatGPT, Llama3, and Mistral.
arXiv Detail & Related papers (2024-04-16T12:27:54Z)
Disclosure and Mitigation of Gender Bias in LLMs [64.79319733514266]
Large Language Models (LLMs) can generate biased responses. We propose an indirect probing framework based on conditional generation. We explore three distinct strategies to disclose explicit and implicit gender bias in LLMs.
arXiv Detail & Related papers (2024-02-17T04:48:55Z)
Self-Debiasing Large Language Models: Zero-Shot Recognition and Reduction of Stereotypes [73.12947922129261]
We leverage the zero-shot capabilities of large language models to reduce stereotyping. We show that self-debiasing can significantly reduce the degree of stereotyping across nine different social groups. We hope this work opens inquiry into other zero-shot techniques for bias mitigation.
arXiv Detail & Related papers (2024-02-03T01:40:11Z)
Probing Explicit and Implicit Gender Bias through LLM Conditional Text Generation [64.79319733514266]
Large Language Models (LLMs) can generate biased and toxic responses. We propose a conditional text generation mechanism without the need for predefined gender phrases and stereotypes.
arXiv Detail & Related papers (2023-11-01T05:31:46Z)
"Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in LLM-Generated Reference Letters [97.11173801187816]
Large Language Models (LLMs) have recently emerged as an effective tool to assist individuals in writing various types of content. This paper critically examines gender biases in LLM-generated reference letters.
arXiv Detail & Related papers (2023-10-13T16:12:57Z)
Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs) We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing. We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z)
Gender bias and stereotypes in Large Language Models [0.6882042556551611]
This paper investigates Large Language Models' behavior with respect to gender stereotypes. We use a simple paradigm to test the presence of gender bias, building on but differing from WinoBias. Our contributions in this paper are as follows: (a) LLMs are 3-6 times more likely to choose an occupation that stereotypically aligns with a person's gender; (b) these choices align with people's perceptions better than with the ground truth as reflected in official job statistics; (d) LLMs ignore crucial ambiguities in sentence structure 95% of the time in our study items, but when explicitly prompted, they recognize
arXiv Detail & Related papers (2023-08-28T22:32:05Z)
Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases. We propose steps towards mitigating social biases during text generation. Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.