DALL-Eval: Probing the Reasoning Skills and Social Biases of
Text-to-Image Generation Models
- URL: http://arxiv.org/abs/2202.04053v3
- Date: Wed, 30 Aug 2023 18:41:01 GMT
- Title: DALL-Eval: Probing the Reasoning Skills and Social Biases of
Text-to-Image Generation Models
- Authors: Jaemin Cho, Abhay Zala, Mohit Bansal
- Abstract summary: We investigate the visual reasoning capabilities and social biases of text-to-image models.
First, we measure three visual reasoning skills: object recognition, object counting, and spatial relation understanding.
Second, we assess the gender and skin tone biases by measuring the gender/skin tone distribution of generated images.
- Score: 73.12069620086311
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, DALL-E, a multimodal transformer language model, and its variants,
including diffusion models, have shown high-quality text-to-image generation
capabilities. However, despite the realistic image generation results, there
has not been a detailed analysis of how to evaluate such models. In this work,
we investigate the visual reasoning capabilities and social biases of different
text-to-image models, covering both multimodal transformer language models and
diffusion models. First, we measure three visual reasoning skills: object
recognition, object counting, and spatial relation understanding. For this, we
propose PaintSkills, a compositional diagnostic evaluation dataset that
measures these skills. Despite the high-fidelity image generation capability, a
large gap exists between the performance of recent models and the upper bound
accuracy in object counting and spatial relation understanding skills. Second,
we assess the gender and skin tone biases by measuring the gender/skin tone
distribution of generated images across various professions and attributes. We
demonstrate that recent text-to-image generation models learn specific biases
about gender and skin tone from web image-text pairs. We hope our work will
help guide future progress in improving text-to-image generation models on
visual reasoning skills and learning socially unbiased representations. Code
and data: https://github.com/j-min/DallEval
Related papers
- Gender Bias Evaluation in Text-to-image Generation: A Survey [25.702257177921048]
We review recent work on gender bias evaluation in text-to-image generation.
We focus on the evaluation of recent popular models such as Stable Diffusion and DALL-E 2.
arXiv Detail & Related papers (2024-08-21T06:01:23Z) - Examining Gender and Racial Bias in Large Vision-Language Models Using a
Novel Dataset of Parallel Images [10.385717398477414]
We present a new dataset PAIRS (PArallel Images for eveRyday Scenarios)
The PAIRS dataset contains sets of AI-generated images of people, such that the images are highly similar in terms of background and visual content, but differ along the dimensions of gender and race.
By querying the LVLMs with such images, we observe significant differences in the responses according to the perceived gender or race of the person depicted.
arXiv Detail & Related papers (2024-02-08T16:11:23Z) - Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.
We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.
Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z) - New Job, New Gender? Measuring the Social Bias in Image Generation Models [85.26441602999014]
Image generation models are susceptible to generating content that perpetuates social stereotypes and biases.
We propose BiasPainter, a framework that can accurately, automatically and comprehensively trigger social bias in image generation models.
BiasPainter can achieve 90.8% accuracy on automatic bias detection, which is significantly higher than the results reported in previous work.
arXiv Detail & Related papers (2024-01-01T14:06:55Z) - Auditing Gender Presentation Differences in Text-to-Image Models [54.16959473093973]
We study how gender is presented differently in text-to-image models.
By probing gender indicators in the input text, we quantify the frequency differences of presentation-centric attributes.
We propose an automatic method to estimate such differences.
arXiv Detail & Related papers (2023-02-07T18:52:22Z) - How well can Text-to-Image Generative Models understand Ethical Natural
Language Interventions? [67.97752431429865]
We study the effect on the diversity of the generated images when adding ethical intervention.
Preliminary studies indicate that a large change in the model predictions is triggered by certain phrases such as 'irrespective of gender'
arXiv Detail & Related papers (2022-10-27T07:32:39Z) - On Advances in Text Generation from Images Beyond Captioning: A Case
Study in Self-Rationalization [89.94078728495423]
We show that recent advances in each modality, CLIP image representations and scaling of language models, do not consistently improve multimodal self-rationalization of tasks with multimodal inputs.
Our findings call for a backbone modelling approach that can be built on to advance text generation from images and text beyond image captioning.
arXiv Detail & Related papers (2022-05-24T00:52:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.