Do Existing Testing Tools Really Uncover Gender Bias in Text-to-Image Models?
- URL: http://arxiv.org/abs/2501.15775v1
- Date: Mon, 27 Jan 2025 04:47:19 GMT
- Title: Do Existing Testing Tools Really Uncover Gender Bias in Text-to-Image Models?
- Authors: Yunbo Lyu, Zhou Yang, Yuqing Niu, Jing Jiang, David Lo,
- Abstract summary: Previous studies have shown that Text-to-Image (T2I) models can perpetuate or even amplify gender stereotypes when provided with neutral text prompts.
No existing work comprehensively compares the various detectors and understands how the gender bias detected by them deviates from the actual situation.
This study addresses this gap by validating previous gender bias detectors using a manually labeled dataset and comparing how the bias identified by various detectors deviates from the actual bias in T2I models.
- Score: 11.101062595569854
- License:
- Abstract: Text-to-Image (T2I) models have recently gained significant attention due to their ability to generate high-quality images and are consequently used in a wide range of applications. However, there are concerns about the gender bias of these models. Previous studies have shown that T2I models can perpetuate or even amplify gender stereotypes when provided with neutral text prompts. Researchers have proposed automated gender bias uncovering detectors for T2I models, but a crucial gap exists: no existing work comprehensively compares the various detectors and understands how the gender bias detected by them deviates from the actual situation. This study addresses this gap by validating previous gender bias detectors using a manually labeled dataset and comparing how the bias identified by various detectors deviates from the actual bias in T2I models, as verified by manual confirmation. We create a dataset consisting of 6,000 images generated from three cutting-edge T2I models: Stable Diffusion XL, Stable Diffusion 3, and Dreamlike Photoreal 2.0. During the human-labeling process, we find that all three T2I models generate a portion (12.48% on average) of low-quality images (e.g., generate images with no face present), where human annotators cannot determine the gender of the person. Our analysis reveals that all three T2I models show a preference for generating male images, with SDXL being the most biased. Additionally, images generated using prompts containing professional descriptions (e.g., lawyer or doctor) show the most bias. We evaluate seven gender bias detectors and find that none fully capture the actual level of bias in T2I models, with some detectors overestimating bias by up to 26.95%. We further investigate the causes of inaccurate estimations, highlighting the limitations of detectors in dealing with low-quality images. Based on our findings, we propose an enhanced detector...
Related papers
- How Do Generative Models Draw a Software Engineer? A Case Study on Stable Diffusion Bias [9.574645433491225]
We evaluate the gender and ethnicity bias exposed by three versions of the Stable Diffusion model towards software engineering tasks.
Results show how all models are significantly biased towards male figures when representing software engineers.
However, all models significantly under-represent Black and Arab figures, regardless of the prompt style used.
arXiv Detail & Related papers (2025-01-15T18:57:17Z) - Gender Bias Evaluation in Text-to-image Generation: A Survey [25.702257177921048]
We review recent work on gender bias evaluation in text-to-image generation.
We focus on the evaluation of recent popular models such as Stable Diffusion and DALL-E 2.
arXiv Detail & Related papers (2024-08-21T06:01:23Z) - Survey of Bias In Text-to-Image Generation: Definition, Evaluation, and Mitigation [47.770531682802314]
Even simple prompts could cause T2I models to exhibit conspicuous social bias in generated images.
We present the first extensive survey on bias in T2I generative models.
We discuss how these works define, evaluate, and mitigate different aspects of bias.
arXiv Detail & Related papers (2024-04-01T10:19:05Z) - The Male CEO and the Female Assistant: Evaluation and Mitigation of Gender Biases in Text-To-Image Generation of Dual Subjects [58.27353205269664]
We propose the Paired Stereotype Test (PST) framework, which queries T2I models to depict two individuals assigned with male-stereotyped and female-stereotyped social identities.
PST queries T2I models to depict two individuals assigned with male-stereotyped and female-stereotyped social identities.
Using PST, we evaluate two aspects of gender biases -- the well-known bias in gendered occupation and a novel aspect: bias in organizational power.
arXiv Detail & Related papers (2024-02-16T21:32:27Z) - Quantifying Bias in Text-to-Image Generative Models [49.60774626839712]
Bias in text-to-image (T2I) models can propagate unfair social representations and may be used to aggressively market ideas or push controversial agendas.
Existing T2I model bias evaluation methods only focus on social biases.
We propose an evaluation methodology to quantify general biases in T2I generative models, without any preconceived notions.
arXiv Detail & Related papers (2023-12-20T14:26:54Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z) - Collecting a Large-Scale Gender Bias Dataset for Coreference Resolution
and Machine Translation [10.542861450223128]
We find grammatical patterns indicating stereotypical and non-stereotypical gender-role assignments in corpora from three domains.
We manually verify the quality of our corpus and use it to evaluate gender bias in various coreference resolution and machine translation models.
arXiv Detail & Related papers (2021-09-08T18:14:11Z) - UnQovering Stereotyping Biases via Underspecified Questions [68.81749777034409]
We present UNQOVER, a framework to probe and quantify biases through underspecified questions.
We show that a naive use of model scores can lead to incorrect bias estimates due to two forms of reasoning errors.
We use this metric to analyze four important classes of stereotypes: gender, nationality, ethnicity, and religion.
arXiv Detail & Related papers (2020-10-06T01:49:52Z) - Mitigating Gender Bias in Captioning Systems [56.25457065032423]
Most captioning models learn gender bias, leading to high gender prediction errors, especially for women.
We propose a new Guided Attention Image Captioning model (GAIC) which provides self-guidance on visual attention to encourage the model to capture correct gender visual evidence.
arXiv Detail & Related papers (2020-06-15T12:16:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.