Dual-Schedule Inversion: Training- and Tuning-Free Inversion for Real Image Editing
- URL: http://arxiv.org/abs/2412.11152v1
- Date: Sun, 15 Dec 2024 11:04:06 GMT
- Title: Dual-Schedule Inversion: Training- and Tuning-Free Inversion for Real Image Editing
- Authors: Jiancheng Huang, Yi Huang, Jianzhuang Liu, Donghao Zhou, Yifan Liu, Shifeng Chen,
- Abstract summary: Most diffusion model-based methods use DDIM Inversion as the first stage before editing.
We propose a new inversion and sampling method named Dual-Schedule Inversion.
We also design a classifier to adaptively combine Dual-Schedule Inversion with different editing methods for user-friendly image editing.
- Score: 43.082008983889956
- License:
- Abstract: Text-conditional image editing is a practical AIGC task that has recently emerged with great commercial and academic value. For real image editing, most diffusion model-based methods use DDIM Inversion as the first stage before editing. However, DDIM Inversion often results in reconstruction failure, leading to unsatisfactory performance for downstream editing. To address this problem, we first analyze why the reconstruction via DDIM Inversion fails. We then propose a new inversion and sampling method named Dual-Schedule Inversion. We also design a classifier to adaptively combine Dual-Schedule Inversion with different editing methods for user-friendly image editing. Our work can achieve superior reconstruction and editing performance with the following advantages: 1) It can reconstruct real images perfectly without fine-tuning, and its reversibility is guaranteed mathematically. 2) The edited object/scene conforms to the semantics of the text prompt. 3) The unedited parts of the object/scene retain the original identity.
Related papers
- Emu Edit: Precise Image Editing via Recognition and Generation Tasks [62.95717180730946]
We present Emu Edit, a multi-task image editing model which sets state-of-the-art results in instruction-based image editing.
We train it to multi-task across an unprecedented range of tasks, such as region-based editing, free-form editing, and Computer Vision tasks.
We show that Emu Edit can generalize to new tasks, such as image inpainting, super-resolution, and compositions of editing tasks, with just a few labeled examples.
arXiv Detail & Related papers (2023-11-16T18:55:58Z) - Object-aware Inversion and Reassembly for Image Editing [61.19822563737121]
We propose Object-aware Inversion and Reassembly (OIR) to enable object-level fine-grained editing.
We use our search metric to find the optimal inversion step for each editing pair when editing an image.
Our method achieves superior performance in editing object shapes, colors, materials, categories, etc., especially in multi-object editing scenarios.
arXiv Detail & Related papers (2023-10-18T17:59:02Z) - KV Inversion: KV Embeddings Learning for Text-Conditioned Real Image
Action Editing [15.831539388569473]
We propose KV Inversion, a method that can achieve satisfactory reconstruction performance and action editing.
Our method does not require training the Stable Diffusion model itself, nor does it require scanning a large-scale dataset to perform time-consuming training.
arXiv Detail & Related papers (2023-09-28T17:07:30Z) - FEC: Three Finetuning-free Methods to Enhance Consistency for Real Image
Editing [0.0]
We propose FEC, which consists of three sampling methods, each designed for different editing types and settings.
FEC achieves two important goals in image editing task: 1) ensuring successful reconstruction, i.e., sampling to get a generated result that preserves the texture and features of the original real image.
None of our sampling methods require fine-tuning of the diffusion model or time-consuming training on large-scale datasets.
arXiv Detail & Related papers (2023-09-26T13:43:06Z) - Forgedit: Text Guided Image Editing via Learning and Forgetting [17.26772361532044]
We design a novel text-guided image editing method, named as Forgedit.
First, we propose a vision-language joint optimization framework capable of reconstructing the original image in 30 seconds.
Then, we propose a novel vector projection mechanism in text embedding space of Diffusion Models.
arXiv Detail & Related papers (2023-09-19T12:05:26Z) - Editing 3D Scenes via Text Prompts without Retraining [80.57814031701744]
DN2N is a text-driven editing method that allows for the direct acquisition of a NeRF model with universal editing capabilities.
Our method employs off-the-shelf text-based editing models of 2D images to modify the 3D scene images.
Our method achieves multiple editing types, including but not limited to appearance editing, weather transition, material changing, and style transfer.
arXiv Detail & Related papers (2023-09-10T02:31:50Z) - LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance [0.0]
LEDITS is a combined lightweight approach for real-image editing, incorporating the Edit Friendly DDPM inversion technique with Semantic Guidance.
This approach achieves versatile edits, both subtle and extensive as well as alterations in composition and style, while requiring no optimization nor extensions to the architecture.
arXiv Detail & Related papers (2023-07-02T09:11:09Z) - StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing [115.49488548588305]
A significant research effort is focused on exploiting the amazing capacities of pretrained diffusion models for the editing of images.
They either finetune the model, or invert the image in the latent space of the pretrained model.
They suffer from two problems: Unsatisfying results for selected regions and unexpected changes in non-selected regions.
arXiv Detail & Related papers (2023-03-28T00:16:45Z) - EditGAN: High-Precision Semantic Image Editing [120.49401527771067]
EditGAN is a novel method for high quality, high precision semantic image editing.
We show that EditGAN can manipulate images with an unprecedented level of detail and freedom.
We can also easily combine multiple edits and perform plausible edits beyond EditGAN training data.
arXiv Detail & Related papers (2021-11-04T22:36:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.