Mitigation of Gender and Ethnicity Bias in AI-Generated Stories through Model Explanations
- URL: http://arxiv.org/abs/2509.04515v1
- Date: Wed, 03 Sep 2025 00:25:25 GMT
- Title: Mitigation of Gender and Ethnicity Bias in AI-Generated Stories through Model Explanations
- Authors: Martha O. Dimgba, Sharon Oba, Ameeta Agrawal, Philippe J. Giabbanelli,
- Abstract summary: Language models have been shown to propagate social bias through their output, particularly in the representation of gender and ethnicity.<n>This paper investigates gender and ethnicity biases in AI-generated occupational stories.<n>Our proposed mitigation strategy, Bias Analysis and Mitigation through Explanation (BAME), reveals improvements in demographic representation ranging from 2% to 20%.
- Score: 2.86989372262348
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language models have been shown to propagate social bias through their output, particularly in the representation of gender and ethnicity. This paper investigates gender and ethnicity biases in AI-generated occupational stories. Representation biases are measured before and after applying our proposed mitigation strategy, Bias Analysis and Mitigation through Explanation (BAME), revealing improvements in demographic representation ranging from 2% to 20%. BAME leverages model-generated explanations to inform targeted prompt engineering, effectively reducing biases without modifying model parameters. By analyzing stories generated across 25 occupational groups, three large language models (Claude 3.5 Sonnet, Llama 3.1 70B Instruct, and GPT-4 Turbo), and multiple demographic dimensions, we identify persistent patterns of overrepresentation and underrepresentation linked to training data stereotypes. Our findings demonstrate that guiding models with their own internal reasoning mechanisms can significantly enhance demographic parity, thereby contributing to the development of more transparent generative AI systems.
Related papers
- Race, Ethnicity and Their Implication on Bias in Large Language Models [9.202525724606188]
We study how race and ethnicity are represented and operationalized within large language models (LLMs)<n>We find that demographic information is distributed across internal units with substantial cross-model variation.<n> Interventions suppressing such neurons reduce bias but leave substantial residual effects.
arXiv Detail & Related papers (2026-01-19T09:24:24Z) - Exploring Bias in over 100 Text-to-Image Generative Models [49.60774626839712]
We investigate bias trends in text-to-image generative models over time, focusing on the increasing availability of models through open platforms like Hugging Face.<n>We assess bias across three key dimensions: (i) distribution bias, (ii) generative hallucination, and (iii) generative miss-rate.<n>Our findings indicate that artistic and style-transferred models exhibit significant bias, whereas foundation models, benefiting from broader training distributions, are becoming progressively less biased.
arXiv Detail & Related papers (2025-03-11T03:40:44Z) - Gender Encoding Patterns in Pretrained Language Model Representations [17.101242741559428]
Gender bias in pretrained language models (PLMs) poses significant social and ethical challenges.<n>This study adopts an information-theoretic approach to analyze how gender biases are encoded within various encoder-based architectures.
arXiv Detail & Related papers (2025-03-09T19:17:46Z) - Biased Heritage: How Datasets Shape Models in Facial Expression Recognition [13.77824359359967]
We study bias propagation from datasets to trained models in image-based Facial Expression Recognition systems.<n>We introduce new bias metrics specifically designed for multiclass problems with multiple demographic groups.<n>Our findings suggest that preventing emotion-specific demographic patterns should be prioritized over general demographic balance in FER datasets.
arXiv Detail & Related papers (2025-03-05T12:25:22Z) - The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention [61.80236015147771]
We quantify the trade-off between using diversity interventions and preserving demographic factuality in T2I models.
Experiments on DoFaiR reveal that diversity-oriented instructions increase the number of different gender and racial groups.
We propose Fact-Augmented Intervention (FAI) to reflect on verbalized or retrieved factual information about gender and racial compositions of generation subjects in history.
arXiv Detail & Related papers (2024-06-29T09:09:42Z) - Less can be more: representational vs. stereotypical gender bias in facial expression recognition [3.9698529891342207]
Machine learning models can inherit biases from their training data, leading to discriminatory or inaccurate predictions.
This paper investigates the propagation of demographic biases from datasets into machine learning models.
We focus on the gender demographic component, analyzing two types of bias: representational and stereotypical.
arXiv Detail & Related papers (2024-06-25T09:26:49Z) - Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information [50.29934517930506]
DAFair is a novel approach to address social bias in language models.
We leverage prototypical demographic texts and incorporate a regularization term during the fine-tuning process to mitigate bias.
arXiv Detail & Related papers (2024-03-14T15:58:36Z) - Stable Bias: Analyzing Societal Representations in Diffusion Models [72.27121528451528]
We propose a new method for exploring the social biases in Text-to-Image (TTI) systems.
Our approach relies on characterizing the variation in generated images triggered by enumerating gender and ethnicity markers in the prompts.
We leverage this method to analyze images generated by 3 popular TTI systems and find that while all of their outputs show correlations with US labor demographics, they also consistently under-represent marginalized identities to different extents.
arXiv Detail & Related papers (2023-03-20T19:32:49Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - The Birth of Bias: A case study on the evolution of gender bias in an
English language model [1.6344851071810076]
We use a relatively small language model, using the LSTM architecture trained on an English Wikipedia corpus.
We find that the representation of gender is dynamic and identify different phases during training.
We show that gender information is represented increasingly locally in the input embeddings of the model.
arXiv Detail & Related papers (2022-07-21T00:59:04Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.