ITI-GEN: Inclusive Text-to-Image Generation
- URL: http://arxiv.org/abs/2309.05569v1
- Date: Mon, 11 Sep 2023 15:54:30 GMT
- Title: ITI-GEN: Inclusive Text-to-Image Generation
- Authors: Cheng Zhang and Xuanbai Chen and Siqi Chai and Chen Henry Wu and
Dmitry Lagun and Thabo Beeler and Fernando De la Torre
- Abstract summary: This study investigates inclusive text-to-image generative models that generate images based on human-written prompts.
We show that, for some attributes, images can represent concepts more expressively than text.
We propose a novel approach, ITI-GEN, that leverages readily available reference images for Inclusive Text-to-Image GENeration.
- Score: 56.72212367905351
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-image generative models often reflect the biases of the training
data, leading to unequal representations of underrepresented groups. This study
investigates inclusive text-to-image generative models that generate images
based on human-written prompts and ensure the resulting images are uniformly
distributed across attributes of interest. Unfortunately, directly expressing
the desired attributes in the prompt often leads to sub-optimal results due to
linguistic ambiguity or model misrepresentation. Hence, this paper proposes a
drastically different approach that adheres to the maxim that "a picture is
worth a thousand words". We show that, for some attributes, images can
represent concepts more expressively than text. For instance, categories of
skin tones are typically hard to specify by text but can be easily represented
by example images. Building upon these insights, we propose a novel approach,
ITI-GEN, that leverages readily available reference images for Inclusive
Text-to-Image GENeration. The key idea is learning a set of prompt embeddings
to generate images that can effectively represent all desired attribute
categories. More importantly, ITI-GEN requires no model fine-tuning, making it
computationally efficient to augment existing text-to-image models. Extensive
experiments demonstrate that ITI-GEN largely improves over state-of-the-art
models to generate inclusive images from a prompt. Project page:
https://czhang0528.github.io/iti-gen.
Related papers
- Reproducibility Study of "ITI-GEN: Inclusive Text-to-Image Generation" [41.94295877935867]
This study aims to reproduce the results presented in "ITI-GEN: Inclusive Text-to-Image Generation"
We show that ITI-GEN sometimes uses undesired attributes as proxy features and it is unable to disentangle some pairs of (correlated) attributes such as gender and baldness.
We propose using Hard Prompt Search with negative prompting, a method that does not require training and that handles negation better than vanilla Hard Prompt Search.
arXiv Detail & Related papers (2024-07-29T13:27:44Z) - Prompt Expansion for Adaptive Text-to-Image Generation [51.67811570987088]
This paper proposes a Prompt Expansion framework that helps users generate high-quality, diverse images with less effort.
The Prompt Expansion model takes a text query as input and outputs a set of expanded text prompts.
We conduct a human evaluation study that shows that images generated through Prompt Expansion are more aesthetically pleasing and diverse than those generated by baseline methods.
arXiv Detail & Related papers (2023-12-27T21:12:21Z) - Word-Level Explanations for Analyzing Bias in Text-to-Image Models [72.71184730702086]
Text-to-image (T2I) models can generate images that underrepresent minorities based on race and sex.
This paper investigates which word in the input prompt is responsible for bias in generated images.
arXiv Detail & Related papers (2023-06-03T21:39:07Z) - SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with
Large Language Models [56.88192537044364]
We propose a simple-yet-effective parameter-efficient fine-tuning approach called the Semantic Understanding and Reasoning adapter (SUR-adapter) for pre-trained diffusion models.
Our approach can make text-to-image diffusion models easier to use with better user experience.
arXiv Detail & Related papers (2023-05-09T05:48:38Z) - On Advances in Text Generation from Images Beyond Captioning: A Case
Study in Self-Rationalization [89.94078728495423]
We show that recent advances in each modality, CLIP image representations and scaling of language models, do not consistently improve multimodal self-rationalization of tasks with multimodal inputs.
Our findings call for a backbone modelling approach that can be built on to advance text generation from images and text beyond image captioning.
arXiv Detail & Related papers (2022-05-24T00:52:40Z) - Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic [72.60554897161948]
Recent text-to-image matching models apply contrastive learning to large corpora of uncurated pairs of images and sentences.
In this work, we repurpose such models to generate a descriptive text given an image at inference time.
The resulting captions are much less restrictive than those obtained by supervised captioning methods.
arXiv Detail & Related papers (2021-11-29T11:01:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.