Implicit Priors Editing in Stable Diffusion via Targeted Token Adjustment
- URL: http://arxiv.org/abs/2412.03400v1
- Date: Wed, 04 Dec 2024 15:31:30 GMT
- Title: Implicit Priors Editing in Stable Diffusion via Targeted Token Adjustment
- Authors: Feng He, Chao Zhang, Zhixue Zhao,
- Abstract summary: We present Embedding-only Editing (Embedit), a method to efficiently adjust implict assumptions and priors in a text-to-image model.
Embedit fine-tunes only the word token embedding (WTE) of the target object ("rose") to optimize the last hidden state of text encoder.
Our method is highly efficient, modifying only 768 parameters for Stable Diffusion 1.4 and 2048 for XL in a single edit.
- Score: 8.231727133072866
- License:
- Abstract: Implicit assumptions and priors are often necessary in text-to-image generation tasks, especially when textual prompts lack sufficient context. However, these assumptions can sometimes reflect outdated concepts, inaccuracies, or societal bias embedded in the training data. We present Embedding-only Editing (Embedit), a method designed to efficiently adjust implict assumptions and priors in the model without affecting its interpretation of unrelated objects or overall performance. Given a "source" prompt (e.g., "rose") that elicits an implicit assumption (e.g., rose is red) and a "destination" prompt that specifies the desired attribute (e.g., "blue rose"), Embedit fine-tunes only the word token embedding (WTE) of the target object ("rose") to optimize the last hidden state of text encoder in Stable Diffusion, a SOTA text-to-image model. This targeted adjustment prevents unintended effects on other objects in the model's knowledge base, as the WTEs for unrelated objects and the model weights remain unchanged. Consequently, when a prompt does not contain the edited object, all representations, and the model outputs are identical to those of the original, unedited model. Our method is highly efficient, modifying only 768 parameters for Stable Diffusion 1.4 and 2048 for XL in a single edit, matching the WTE dimension of each respective model. This minimal scope, combined with rapid execution, makes Embedit highly practical for real-world applications. Additionally, changes are easily reversible by restoring the original WTE layers. Our experimental results demonstrate that Embedit consistently outperforms previous methods across various models, tasks, and editing scenarios (both single and sequential multiple edits), achieving at least a 6.01% improvement (from 87.17% to 93.18%).
Related papers
- Learning Where to Edit Vision Transformers [27.038720045544867]
We propose a locate-then-edit approach for editing vision Transformers (ViTs) in computer vision.
We first address the where-to-edit challenge by meta-learning a hypernetwork on CutMix-augmented data.
To validate our method, we construct an editing benchmark that introduces subpopulation shifts towards natural underrepresented images and AI-generated images.
arXiv Detail & Related papers (2024-11-04T10:17:40Z) - Localizing and Editing Knowledge in Text-to-Image Generative Models [62.02776252311559]
knowledge about different attributes is not localized in isolated components, but is instead distributed amongst a set of components in the conditional UNet.
We introduce a fast, data-free model editing method Diff-QuickFix which can effectively edit concepts in text-to-image models.
arXiv Detail & Related papers (2023-10-20T17:31:12Z) - Forgedit: Text Guided Image Editing via Learning and Forgetting [17.26772361532044]
We design a novel text-guided image editing method, named as Forgedit.
First, we propose a vision-language joint optimization framework capable of reconstructing the original image in 30 seconds.
Then, we propose a novel vector projection mechanism in text embedding space of Diffusion Models.
arXiv Detail & Related papers (2023-09-19T12:05:26Z) - Unified Concept Editing in Diffusion Models [53.30378722979958]
We present a method that tackles all issues with a single approach.
Our method, Unified Concept Editing (UCE), edits the model without training using a closed-form solution.
We demonstrate scalable simultaneous debiasing, style erasure, and content moderation by editing text-to-image projections.
arXiv Detail & Related papers (2023-08-25T17:59:59Z) - Editing Implicit Assumptions in Text-to-Image Diffusion Models [48.542005079915896]
Text-to-image diffusion models often make implicit assumptions about the world when generating images.
In this work, we aim to edit a given implicit assumption in a pre-trained diffusion model.
Our method is highly efficient, as it modifies a mere 2.2% of the model's parameters in under one second.
arXiv Detail & Related papers (2023-03-14T17:14:21Z) - Aging with GRACE: Lifelong Model Editing with Discrete Key-Value
Adaptors [53.819805242367345]
We propose GRACE, a lifelong model editing method, which implements spot-fixes on streaming errors of a deployed model.
GRACE writes new mappings into a pre-trained model's latent space, creating a discrete, local codebook of edits without altering model weights.
Our experiments on T5, BERT, and GPT models show GRACE's state-of-the-art performance in making and retaining edits, while generalizing to unseen inputs.
arXiv Detail & Related papers (2022-11-20T17:18:22Z) - Null-text Inversion for Editing Real Images using Guided Diffusion
Models [44.27570654402436]
We introduce an accurate inversion technique and thus facilitate an intuitive text-based modification of the image.
Our Null-text inversion, based on the publicly available Stable Diffusion model, is extensively evaluated on a variety of images and prompt editing.
arXiv Detail & Related papers (2022-11-17T18:58:14Z) - Improving Factual Consistency in Summarization with Compression-Based
Post-Editing [146.24839415743358]
We show that a model-agnostic way to address this problem is post-editing the generated summaries.
We propose to use sentence-compression data to train the post-editing model to take a summary with extrinsic entity errors marked with special tokens.
We show that this model improves factual consistency while maintaining ROUGE, improving entity precision by up to 30% on XSum, and that this model can be applied on top of another post-editor.
arXiv Detail & Related papers (2022-11-11T13:35:38Z) - SITA: Single Image Test-time Adaptation [48.789568233682296]
In Test-time Adaptation (TTA), given a model trained on some source data, the goal is to adapt it to make better predictions for test instances from a different distribution.
We consider TTA in a more pragmatic setting which we refer to as SITA (Single Image Test-time Adaptation)
Here, when making each prediction, the model has access only to the given single test instance, rather than a batch of instances.
We propose a novel approach AugBN for the SITA setting that requires only forward-preserving propagation.
arXiv Detail & Related papers (2021-12-04T15:01:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.