Related papers: Reproducibility Study of "ITI-GEN: Inclusive Text-to-Image Generation"

Reproducibility Study of "ITI-GEN: Inclusive Text-to-Image Generation"

URL: http://arxiv.org/abs/2407.19996v1
Date: Mon, 29 Jul 2024 13:27:44 GMT
Title: Reproducibility Study of "ITI-GEN: Inclusive Text-to-Image Generation"
Authors: Daniel Gallo Fernández, Răzvan-Andrei Matisan, Alejandro Monroy Muñoz, Janusz Partyka,
Abstract summary: This study aims to reproduce the results presented in "ITI-GEN: Inclusive Text-to-Image Generation" We show that ITI-GEN sometimes uses undesired attributes as proxy features and it is unable to disentangle some pairs of (correlated) attributes such as gender and baldness. We propose using Hard Prompt Search with negative prompting, a method that does not require training and that handles negation better than vanilla Hard Prompt Search.
Score: 41.94295877935867
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Text-to-image generative models often present issues regarding fairness with respect to certain sensitive attributes, such as gender or skin tone. This study aims to reproduce the results presented in "ITI-GEN: Inclusive Text-to-Image Generation" by Zhang et al. (2023a), which introduces a model to improve inclusiveness in these kinds of models. We show that most of the claims made by the authors about ITI-GEN hold: it improves the diversity and quality of generated images, it is scalable to different domains, it has plug-and-play capabilities, and it is efficient from a computational point of view. However, ITI-GEN sometimes uses undesired attributes as proxy features and it is unable to disentangle some pairs of (correlated) attributes such as gender and baldness. In addition, when the number of considered attributes increases, the training time grows exponentially and ITI-GEN struggles to generate inclusive images for all elements in the joint distribution. To solve these issues, we propose using Hard Prompt Search with negative prompting, a method that does not require training and that handles negation better than vanilla Hard Prompt Search. Nonetheless, Hard Prompt Search (with or without negative prompting) cannot be used for continuous attributes that are hard to express in natural language, an area where ITI-GEN excels as it is guided by images during training. Finally, we propose combining ITI-GEN and Hard Prompt Search with negative prompting.

Related papers

Conditional Text-to-Image Generation with Reference Guidance [81.99538302576302]
This paper explores using additional conditions of an image that provides visual guidance of the particular subjects for diffusion models to generate. We develop several small-scale expert plugins that efficiently endow a Stable Diffusion model with the capability to take different references. Our expert plugins demonstrate superior results than the existing methods on all tasks, each containing only 28.55M trainable parameters.
arXiv Detail & Related papers (2024-11-22T21:38:51Z)
FairQueue: Rethinking Prompt Learning for Fair Text-to-Image Generation [28.185503858652456]
prompt learning has emerged as the state-of-the-art (SOTA) for fair text-to-image (T2I) generation. In this work, we first reveal that this prompt learning-based approach results in degraded sample quality. We propose two ideas: (i) Prompt Queuing and (ii) Attention Amplification to address the quality issue.
arXiv Detail & Related papers (2024-10-24T10:16:09Z)
Enhancing Multimodal Compositional Reasoning of Visual Language Models with Generative Negative Mining [58.379339799777064]
Large-scale visual language models (VLMs) exhibit strong representation capacities, making them ubiquitous for enhancing image and text understanding tasks. We propose a framework that not only mines in both directions but also generates challenging negative samples in both modalities. Our code and dataset are released at https://ugorsahin.github.io/enhancing-multimodal-compositional-reasoning-of-vlm.html.
arXiv Detail & Related papers (2023-11-07T13:05:47Z)
Diversity and Diffusion: Observations on Synthetic Image Distributions with Stable Diffusion [6.491645162078057]
Text-to-image (TTI) systems have made it possible to create realistic images with simple text prompts. In all of the experiments performed to date, classifiers trained solely with synthetic images perform poorly at inference. We find four issues that limit the usefulness of TTI systems for this task: ambiguity, adherence to prompt, lack of diversity, and inability to represent the underlying concept.
arXiv Detail & Related papers (2023-10-31T18:05:15Z)
ITI-GEN: Inclusive Text-to-Image Generation [56.72212367905351]
This study investigates inclusive text-to-image generative models that generate images based on human-written prompts. We show that, for some attributes, images can represent concepts more expressively than text. We propose a novel approach, ITI-GEN, that leverages readily available reference images for Inclusive Text-to-Image GENeration.
arXiv Detail & Related papers (2023-09-11T15:54:30Z)
Social Biases through the Text-to-Image Generation Lens [9.137275391251517]
Text-to-Image (T2I) generation is enabling new applications that support creators, designers, and general end users of productivity software. We take a multi-dimensional approach to studying and quantifying common social biases as reflected in the generated images. We present findings for two popular T2I models: DALLE-v2 and Stable Diffusion.
arXiv Detail & Related papers (2023-03-30T05:29:13Z)
Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models [103.61066310897928]
Recent text-to-image generative models have demonstrated an unparalleled ability to generate diverse and creative imagery guided by a target text prompt. While revolutionary, current state-of-the-art diffusion models may still fail in generating images that fully convey the semantics in the given text prompt. We analyze the publicly available Stable Diffusion model and assess the existence of catastrophic neglect, where the model fails to generate one or more of the subjects from the input prompt. We introduce the concept of Generative Semantic Nursing (GSN), where we seek to intervene in the generative process on the fly during inference time to improve the faithfulness
arXiv Detail & Related papers (2023-01-31T18:10:38Z)
How well can Text-to-Image Generative Models understand Ethical Natural Language Interventions? [67.97752431429865]
We study the effect on the diversity of the generated images when adding ethical intervention. Preliminary studies indicate that a large change in the model predictions is triggered by certain phrases such as 'irrespective of gender'
arXiv Detail & Related papers (2022-10-27T07:32:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.