Editing Implicit Assumptions in Text-to-Image Diffusion Models
- URL: http://arxiv.org/abs/2303.08084v2
- Date: Fri, 25 Aug 2023 16:18:51 GMT
- Title: Editing Implicit Assumptions in Text-to-Image Diffusion Models
- Authors: Hadas Orgad, Bahjat Kawar, Yonatan Belinkov
- Abstract summary: Text-to-image diffusion models often make implicit assumptions about the world when generating images.
In this work, we aim to edit a given implicit assumption in a pre-trained diffusion model.
Our method is highly efficient, as it modifies a mere 2.2% of the model's parameters in under one second.
- Score: 48.542005079915896
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-image diffusion models often make implicit assumptions about the
world when generating images. While some assumptions are useful (e.g., the sky
is blue), they can also be outdated, incorrect, or reflective of social biases
present in the training data. Thus, there is a need to control these
assumptions without requiring explicit user input or costly re-training. In
this work, we aim to edit a given implicit assumption in a pre-trained
diffusion model. Our Text-to-Image Model Editing method, TIME for short,
receives a pair of inputs: a "source" under-specified prompt for which the
model makes an implicit assumption (e.g., "a pack of roses"), and a
"destination" prompt that describes the same setting, but with a specified
desired attribute (e.g., "a pack of blue roses"). TIME then updates the model's
cross-attention layers, as these layers assign visual meaning to textual
tokens. We edit the projection matrices in these layers such that the source
prompt is projected close to the destination prompt. Our method is highly
efficient, as it modifies a mere 2.2% of the model's parameters in under one
second. To evaluate model editing approaches, we introduce TIMED (TIME
Dataset), containing 147 source and destination prompt pairs from various
domains. Our experiments (using Stable Diffusion) show that TIME is successful
in model editing, generalizes well for related prompts unseen during editing,
and imposes minimal effect on unrelated generations.
Related papers
- DreamDistribution: Prompt Distribution Learning for Text-to-Image
Diffusion Models [53.17454737232668]
We introduce a solution that allows a pretrained T2I diffusion model to learn a set of soft prompts.
These prompts offer text-guided editing capabilities and additional flexibility in controlling variation and mixing between multiple distributions.
We also show the adaptability of the learned prompt distribution to other tasks, such as text-to-3D.
arXiv Detail & Related papers (2023-12-21T12:11:00Z) - Localizing and Editing Knowledge in Text-to-Image Generative Models [62.02776252311559]
knowledge about different attributes is not localized in isolated components, but is instead distributed amongst a set of components in the conditional UNet.
We introduce a fast, data-free model editing method Diff-QuickFix which can effectively edit concepts in text-to-image models.
arXiv Detail & Related papers (2023-10-20T17:31:12Z) - Reverse Stable Diffusion: What prompt was used to generate this image? [73.10116197883303]
We study the task of predicting the prompt embedding given an image generated by a generative diffusion model.
We propose a novel learning framework comprising a joint prompt regression and multi-label vocabulary classification objective.
We conduct experiments on the DiffusionDB data set, predicting text prompts from images generated by Stable Diffusion.
arXiv Detail & Related papers (2023-08-02T23:39:29Z) - LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image
Diffusion Models with Large Language Models [62.75006608940132]
This work proposes to enhance prompt understanding capabilities in text-to-image diffusion models.
Our method leverages a pretrained large language model for grounded generation in a novel two-stage process.
Our method significantly outperforms the base diffusion model and several strong baselines in accurately generating images.
arXiv Detail & Related papers (2023-05-23T03:59:06Z) - If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based
Text-to-Image Generation by Selection [53.320946030761796]
diffusion-based text-to-image (T2I) models can lack faithfulness to the text prompt.
We show that large T2I diffusion models are more faithful than usually assumed, and can generate images faithful to even complex prompts.
We introduce a pipeline that generates candidate images for a text prompt and picks the best one according to an automatic scoring system.
arXiv Detail & Related papers (2023-05-22T17:59:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.