Auditing Gender Presentation Differences in Text-to-Image Models
- URL: http://arxiv.org/abs/2302.03675v2
- Date: Wed, 8 Feb 2023 01:55:54 GMT
- Title: Auditing Gender Presentation Differences in Text-to-Image Models
- Authors: Yanzhe Zhang, Lu Jiang, Greg Turk, Diyi Yang
- Abstract summary: We study how gender is presented differently in text-to-image models.
By probing gender indicators in the input text, we quantify the frequency differences of presentation-centric attributes.
We propose an automatic method to estimate such differences.
- Score: 54.16959473093973
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-image models, which can generate high-quality images based on textual
input, have recently enabled various content-creation tools. Despite
significantly affecting a wide range of downstream applications, the
distributions of these generated images are still not fully understood,
especially when it comes to the potential stereotypical attributes of different
genders. In this work, we propose a paradigm (Gender Presentation Differences)
that utilizes fine-grained self-presentation attributes to study how gender is
presented differently in text-to-image models. By probing gender indicators in
the input text (e.g., "a woman" or "a man"), we quantify the frequency
differences of presentation-centric attributes (e.g., "a shirt" and "a dress")
through human annotation and introduce a novel metric: GEP. Furthermore, we
propose an automatic method to estimate such differences. The automatic GEP
metric based on our approach yields a higher correlation with human annotations
than that based on existing CLIP scores, consistently across three
state-of-the-art text-to-image models. Finally, we demonstrate the
generalization ability of our metrics in the context of gender stereotypes
related to occupations.
Related papers
- GRADE: Quantifying Sample Diversity in Text-to-Image Models [66.12068246962762]
We propose GRADE: Granular Attribute Diversity Evaluation, an automatic method for quantifying sample diversity.
We measure the overall diversity of 12 T2I models using 400 concept-attribute pairs, revealing that all models display limited variation.
Our work proposes a modern, semantically-driven approach to measure sample diversity and highlights the stunning homogeneity in outputs by T2I models.
arXiv Detail & Related papers (2024-10-29T23:10:28Z) - Examining Gender and Racial Bias in Large Vision-Language Models Using a
Novel Dataset of Parallel Images [10.385717398477414]
We present a new dataset PAIRS (PArallel Images for eveRyday Scenarios)
The PAIRS dataset contains sets of AI-generated images of people, such that the images are highly similar in terms of background and visual content, but differ along the dimensions of gender and race.
By querying the LVLMs with such images, we observe significant differences in the responses according to the perceived gender or race of the person depicted.
arXiv Detail & Related papers (2024-02-08T16:11:23Z) - Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.
We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.
Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z) - Targeted Image Data Augmentation Increases Basic Skills Captioning
Robustness [0.932065750652415]
TIDA (Targeted Image-editing Data Augmentation) is a targeted data augmentation method focused on improving models' human-like abilities.
We show that a TIDA-enhanced dataset related to gender, color, and counting abilities induces better performance in several image captioning metrics.
arXiv Detail & Related papers (2023-09-27T20:12:41Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z) - Vision-Language Models Performing Zero-Shot Tasks Exhibit Gender-based
Disparities [19.03751960721954]
We explore the extent to which zero-shot vision-language models exhibit gender bias for different vision tasks.
We evaluate different vision-language models with multiple datasets across a set of concepts.
arXiv Detail & Related papers (2023-01-26T13:44:31Z) - How well can Text-to-Image Generative Models understand Ethical Natural
Language Interventions? [67.97752431429865]
We study the effect on the diversity of the generated images when adding ethical intervention.
Preliminary studies indicate that a large change in the model predictions is triggered by certain phrases such as 'irrespective of gender'
arXiv Detail & Related papers (2022-10-27T07:32:39Z) - DALL-Eval: Probing the Reasoning Skills and Social Biases of
Text-to-Image Generation Models [73.12069620086311]
We investigate the visual reasoning capabilities and social biases of text-to-image models.
First, we measure three visual reasoning skills: object recognition, object counting, and spatial relation understanding.
Second, we assess the gender and skin tone biases by measuring the gender/skin tone distribution of generated images.
arXiv Detail & Related papers (2022-02-08T18:36:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.