Latent Inversion with Timestep-aware Sampling for Training-free
Non-rigid Editing
- URL: http://arxiv.org/abs/2402.08601v2
- Date: Wed, 14 Feb 2024 08:03:00 GMT
- Title: Latent Inversion with Timestep-aware Sampling for Training-free
Non-rigid Editing
- Authors: Yunji Jung, Seokju Lee, Tair Djanibekov, Hyunjung Shim, Jong Chul Ye
- Abstract summary: We propose a training-free approach for non-rigid editing with Stable Diffusion.
Our approach comprises three stages: text optimization, latent inversion, and timestep-aware text injection sampling.
We demonstrate the effectiveness of our method in terms of identity preservation, editability, and aesthetic quality.
- Score: 60.65516454338772
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-guided non-rigid editing involves complex edits for input images, such
as changing motion or compositions within their surroundings. Since it requires
manipulating the input structure, existing methods often struggle with
preserving object identity and background, particularly when combined with
Stable Diffusion. In this work, we propose a training-free approach for
non-rigid editing with Stable Diffusion, aimed at improving the identity
preservation quality without compromising editability. Our approach comprises
three stages: text optimization, latent inversion, and timestep-aware text
injection sampling. Inspired by the recent success of Imagic, we employ their
text optimization for smooth editing. Then, we introduce latent inversion to
preserve the input image's identity without additional model fine-tuning. To
fully utilize the input reconstruction ability of latent inversion, we suggest
timestep-aware text inject sampling. This effectively retains the structure of
the input image by injecting the source text prompt in early sampling steps and
then transitioning to the target prompt in subsequent sampling steps. This
strategic approach seamlessly harmonizes with text optimization, facilitating
complex non-rigid edits to the input without losing the original identity. We
demonstrate the effectiveness of our method in terms of identity preservation,
editability, and aesthetic quality through extensive experiments.
Related papers
- DragText: Rethinking Text Embedding in Point-based Image Editing [3.1923251959845214]
We show that during the progressive editing of an input image in a diffusion model, the text embedding remains constant.
We propose DragText, which optimize text embedding in conjunction with the dragging process to pair with the modified image embedding.
arXiv Detail & Related papers (2024-07-25T07:57:55Z) - Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance [62.15866177242207]
We show that through constructing a subject-agnostic condition, one could obtain outputs consistent with both the given subject and input text prompts.
Our approach is conceptually simple and requires only minimal code modifications, but leads to substantial quality improvements.
arXiv Detail & Related papers (2024-05-02T15:03:41Z) - Tuning-Free Inversion-Enhanced Control for Consistent Image Editing [44.311286151669464]
We present a novel approach called Tuning-free Inversion-enhanced Control (TIC)
TIC correlates features from the inversion process with those from the sampling process to mitigate the inconsistency in DDIM reconstruction.
We also propose a mask-guided attention concatenation strategy that combines contents from both the inversion and the naive DDIM editing processes.
arXiv Detail & Related papers (2023-12-22T11:13:22Z) - Latent Space Editing in Transformer-Based Flow Matching [53.75073756305241]
Flow Matching with a transformer backbone offers the potential for scalable and high-quality generative modeling.
We introduce an editing space, $u$-space, that can be manipulated in a controllable, accumulative, and composable manner.
Lastly, we put forth a straightforward yet powerful method for achieving fine-grained and nuanced editing using text prompts.
arXiv Detail & Related papers (2023-12-17T21:49:59Z) - BARET : Balanced Attention based Real image Editing driven by
Target-text Inversion [36.59406959595952]
We propose a novel editing technique that only requires an input image and target text for various editing types including non-rigid edits without fine-tuning diffusion model.
Our method contains three novelties: (I) Targettext Inversion Schedule (TTIS) is designed to fine-tune the input target text embedding to achieve fast image reconstruction without image caption and acceleration of convergence; (II) Progressive Transition Scheme applies progressive linear approaches between target text embedding and its fine-tuned version to generate transition embedding for maintaining non-rigid editing capability; (III) Balanced Attention Module (BAM) balances the tradeoff between textual description and image semantics
arXiv Detail & Related papers (2023-12-09T07:18:23Z) - Inversion-Free Image Editing with Natural Language [18.373145158518135]
We present inversion-free editing (InfEdit), which allows for consistent and faithful editing for both rigid and non-rigid semantic changes.
InfEdit shows strong performance in various editing tasks and also maintains a seamless workflow (less than 3 seconds on one single A40), demonstrating the potential for real-time applications.
arXiv Detail & Related papers (2023-12-07T18:58:27Z) - StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing [86.92711729969488]
We exploit the amazing capacities of pretrained diffusion models for the editing of images.
They either finetune the model, or invert the image in the latent space of the pretrained model.
They suffer from two problems: Unsatisfying results for selected regions, and unexpected changes in nonselected regions.
arXiv Detail & Related papers (2023-03-28T00:16:45Z) - Null-text Inversion for Editing Real Images using Guided Diffusion
Models [44.27570654402436]
We introduce an accurate inversion technique and thus facilitate an intuitive text-based modification of the image.
Our Null-text inversion, based on the publicly available Stable Diffusion model, is extensively evaluated on a variety of images and prompt editing.
arXiv Detail & Related papers (2022-11-17T18:58:14Z) - Being Comes from Not-being: Open-vocabulary Text-to-Motion Generation
with Wordless Training [178.09150600453205]
In this paper, we investigate offline open-vocabulary text-to-motion generation in a zero-shot learning manner.
Inspired by the prompt learning in NLP, we pretrain a motion generator that learns to reconstruct the full motion from the masked motion.
Our method reformulates the input text into a masked motion as the prompt for the motion generator to reconstruct'' the motion.
arXiv Detail & Related papers (2022-10-28T06:20:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.