The Male CEO and the Female Assistant: Gender Biases in Text-To-Image Generation of Dual Subjects
- URL: http://arxiv.org/abs/2402.11089v2
- Date: Wed, 19 Jun 2024 17:23:44 GMT
- Title: The Male CEO and the Female Assistant: Gender Biases in Text-To-Image Generation of Dual Subjects
- Authors: Yixin Wan, Kai-Wei Chang,
- Abstract summary: We propose the Paired Stereotype Test (PST) framework to systematically evaluate T2I models in dual-subject generation setting.
PST is a dual-subject generation task, i.e. generating two people in the same image.
We show that despite generating seemingly fair or even anti-stereotype single-person images, DALLE-3 still shows notable biases under PST.
- Score: 58.27353205269664
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Recent large-scale T2I models like DALLE-3 have made progress on improving fairness in single-subject generation, i.e. generating a one-person image. However, we reveal that these improved models still demonstrate considerable biases when simply generating two people. To systematically evaluate T2I models in this challenging generation setting, we propose the Paired Stereotype Test (PST) framework, established as a dual-subject generation task, i.e. generating two people in the same image. The setting in PST is especially challenging, as the two individuals are described with social identities that are male-stereotyped and female-stereotyped, respectively, e.g. "a CEO" and "an Assistant". It is easy for T2I models to unfairly follow gender stereotypes in this contrastive setting. We establish a metric, Stereotype Score (SS), to quantitatively measure the adherence to gender stereotypes in generated images. Using PST, we evaluate two aspects of gender biases in DALLE-3 -- the widely-identified bias in gendered occupation, as well as a novel aspect: bias in organizational power. Results show that despite generating seemingly fair or even anti-stereotype single-person images, DALLE-3 still shows notable biases under PST -- for instance, in experiments on gender-occupational stereotypes, over 74% model generations demonstrate biases. Moreover, compared to single-person settings, DALLE-3 is more likely to perpetuate male-associated stereotypes under PST. Our work pioneers the research on bias in dual-subject generation, and our proposed PST framework can be easily extended for further experiments, establishing a valuable contribution.
Related papers
- Model-Agnostic Gender Bias Control for Text-to-Image Generation via Sparse Autoencoder [14.164976259534143]
Text-to-image (T2I) diffusion models often exhibit gender bias, particularly by generating stereotypical associations between professions and gendered subjects.<n>This paper presents SAE Debias, a model-agnostic framework for mitigating such bias in T2I generation.<n>To the best of our knowledge, this is the first work to apply sparse autoencoders for identifying and intervening in gender bias within T2I models.
arXiv Detail & Related papers (2025-07-28T16:36:13Z) - A Large Scale Analysis of Gender Biases in Text-to-Image Generative Models [45.55471356313678]
This paper presents the first large-scale study on gender bias in text-to-image (T2I) models.
We create a dataset of 3,217 gender-neutral prompts and generate 200 images per prompt from five leading T2I models.
We automatically detect the perceived gender of people in the generated images and filter out images with no person or multiple people of different genders.
arXiv Detail & Related papers (2025-03-30T11:11:51Z) - Do Existing Testing Tools Really Uncover Gender Bias in Text-to-Image Models? [11.101062595569854]
Previous studies have shown that Text-to-Image (T2I) models can perpetuate or even amplify gender stereotypes when provided with neutral text prompts.
No existing work comprehensively compares the various detectors and understands how the gender bias detected by them deviates from the actual situation.
This study addresses this gap by validating previous gender bias detectors using a manually labeled dataset and comparing how the bias identified by various detectors deviates from the actual bias in T2I models.
arXiv Detail & Related papers (2025-01-27T04:47:19Z) - Gender Bias Evaluation in Text-to-image Generation: A Survey [25.702257177921048]
We review recent work on gender bias evaluation in text-to-image generation.
We focus on the evaluation of recent popular models such as Stable Diffusion and DALL-E 2.
arXiv Detail & Related papers (2024-08-21T06:01:23Z) - GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models.
Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs.
Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z) - Survey of Bias In Text-to-Image Generation: Definition, Evaluation, and Mitigation [47.770531682802314]
Even simple prompts could cause T2I models to exhibit conspicuous social bias in generated images.
We present the first extensive survey on bias in T2I generative models.
We discuss how these works define, evaluate, and mitigate different aspects of bias.
arXiv Detail & Related papers (2024-04-01T10:19:05Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z) - Stable Bias: Analyzing Societal Representations in Diffusion Models [72.27121528451528]
We propose a new method for exploring the social biases in Text-to-Image (TTI) systems.
Our approach relies on characterizing the variation in generated images triggered by enumerating gender and ethnicity markers in the prompts.
We leverage this method to analyze images generated by 3 popular TTI systems and find that while all of their outputs show correlations with US labor demographics, they also consistently under-represent marginalized identities to different extents.
arXiv Detail & Related papers (2023-03-20T19:32:49Z) - Towards Understanding Gender-Seniority Compound Bias in Natural Language
Generation [64.65911758042914]
We investigate how seniority impacts the degree of gender bias exhibited in pretrained neural generation models.
Our results show that GPT-2 amplifies bias by considering women as junior and men as senior more often than the ground truth in both domains.
These results suggest that NLP applications built using GPT-2 may harm women in professional capacities.
arXiv Detail & Related papers (2022-05-19T20:05:02Z) - Are Gender-Neutral Queries Really Gender-Neutral? Mitigating Gender Bias
in Image Search [8.730027941735804]
We study a unique gender bias in image search.
The search images are often gender-imbalanced for gender-neutral natural language queries.
We introduce two novel debiasing approaches.
arXiv Detail & Related papers (2021-09-12T04:47:33Z) - Stereotype and Skew: Quantifying Gender Bias in Pre-trained and
Fine-tuned Language Models [5.378664454650768]
This paper proposes two intuitive metrics, skew and stereotype, that quantify and analyse the gender bias present in contextual language models.
We find evidence that gender stereotype correlates approximately negatively with gender skew in out-of-the-box models, suggesting that there is a trade-off between these two forms of bias.
arXiv Detail & Related papers (2021-01-24T10:57:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.