Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code
- URL: http://arxiv.org/abs/2310.01506v2
- Date: Thu, 19 Oct 2023 13:02:21 GMT
- Title: Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code
- Authors: Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, Qiang Xu
- Abstract summary: "Direct Inversion" is a novel technique achieving optimal performance of both branches with just three lines of code.
We present PIE-Bench, an editing benchmark with 700 images showcasing diverse scenes and editing types.
Compared to state-of-the-art optimization-based inversion techniques, our solution not only yields superior performance across 8 editing methods but also achieves nearly an order of speed-up.
- Score: 19.988947272980848
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-guided diffusion models have revolutionized image generation and
editing, offering exceptional realism and diversity. Specifically, in the
context of diffusion-based editing, where a source image is edited according to
a target prompt, the process commences by acquiring a noisy latent vector
corresponding to the source image via the diffusion model. This vector is
subsequently fed into separate source and target diffusion branches for
editing. The accuracy of this inversion process significantly impacts the final
editing outcome, influencing both essential content preservation of the source
image and edit fidelity according to the target prompt. Prior inversion
techniques aimed at finding a unified solution in both the source and target
diffusion branches. However, our theoretical and empirical analyses reveal that
disentangling these branches leads to a distinct separation of responsibilities
for preserving essential content and ensuring edit fidelity. Building on this
insight, we introduce "Direct Inversion," a novel technique achieving optimal
performance of both branches with just three lines of code. To assess image
editing performance, we present PIE-Bench, an editing benchmark with 700 images
showcasing diverse scenes and editing types, accompanied by versatile
annotations and comprehensive evaluation metrics. Compared to state-of-the-art
optimization-based inversion techniques, our solution not only yields superior
performance across 8 editing methods but also achieves nearly an order of
speed-up.
Related papers
- Vision-guided and Mask-enhanced Adaptive Denoising for Prompt-based Image Editing [67.96788532285649]
We present a Vision-guided and Mask-enhanced Adaptive Editing (ViMAEdit) method with three key novel designs.
First, we propose to leverage image embeddings as explicit guidance to enhance the conventional textual prompt-based denoising process.
Second, we devise a self-attention-guided iterative editing area grounding strategy.
arXiv Detail & Related papers (2024-10-14T13:41:37Z) - Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing [60.730661748555214]
We introduce textbfTask-textbfOriented textbfDiffusion textbfInversion (textbfTODInv), a novel framework that inverts and edits real images tailored to specific editing tasks.
ToDInv seamlessly integrates inversion and editing through reciprocal optimization, ensuring both high fidelity and precise editability.
arXiv Detail & Related papers (2024-08-23T22:16:34Z) - TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models [53.757752110493215]
We focus on a popular line of text-based editing frameworks - the edit-friendly'' DDPM-noise inversion approach.
We analyze its application to fast sampling methods and categorize its failures into two classes: the appearance of visual artifacts, and insufficient editing strength.
We propose a pseudo-guidance approach that efficiently increases the magnitude of edits without introducing new artifacts.
arXiv Detail & Related papers (2024-08-01T17:27:28Z) - Enhancing Text-to-Image Editing via Hybrid Mask-Informed Fusion [61.42732844499658]
This paper systematically improves the text-guided image editing techniques based on diffusion models.
We incorporate human annotation as an external knowledge to confine editing within a Mask-informed'' region.
arXiv Detail & Related papers (2024-05-24T07:53:59Z) - Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing [2.5602836891933074]
A commonly adopted strategy for editing real images involves inverting the diffusion process to obtain a noisy representation of the original image.
Current methods for diffusion inversion often struggle to produce edits that are both faithful to the specified text prompt and closely resemble the source image.
We introduce a novel and adaptable diffusion inversion technique for real image editing, which is grounded in a theoretical analysis of the role of $eta$ in the DDIM sampling equation for enhanced editability.
arXiv Detail & Related papers (2024-03-14T15:07:36Z) - DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image
Editing [66.43179841884098]
Large-scale Text-to-Image (T2I) diffusion models have revolutionized image generation over the last few years.
We propose DiffEditor to rectify two weaknesses in existing diffusion-based image editing.
Our method can efficiently achieve state-of-the-art performance on various fine-grained image editing tasks.
arXiv Detail & Related papers (2024-02-04T18:50:29Z) - BARET : Balanced Attention based Real image Editing driven by
Target-text Inversion [36.59406959595952]
We propose a novel editing technique that only requires an input image and target text for various editing types including non-rigid edits without fine-tuning diffusion model.
Our method contains three novelties: (I) Targettext Inversion Schedule (TTIS) is designed to fine-tune the input target text embedding to achieve fast image reconstruction without image caption and acceleration of convergence; (II) Progressive Transition Scheme applies progressive linear approaches between target text embedding and its fine-tuned version to generate transition embedding for maintaining non-rigid editing capability; (III) Balanced Attention Module (BAM) balances the tradeoff between textual description and image semantics
arXiv Detail & Related papers (2023-12-09T07:18:23Z) - Object-aware Inversion and Reassembly for Image Editing [61.19822563737121]
We propose Object-aware Inversion and Reassembly (OIR) to enable object-level fine-grained editing.
We use our search metric to find the optimal inversion step for each editing pair when editing an image.
Our method achieves superior performance in editing object shapes, colors, materials, categories, etc., especially in multi-object editing scenarios.
arXiv Detail & Related papers (2023-10-18T17:59:02Z) - InFusion: Inject and Attention Fusion for Multi Concept Zero-Shot
Text-based Video Editing [27.661609140918916]
InFusion is a framework for zero-shot text-based video editing.
It supports editing of multiple concepts with pixel-level control over diverse concepts mentioned in the editing prompt.
Our framework is a low-cost alternative to one-shot tuned models for editing since it does not require training.
arXiv Detail & Related papers (2023-07-22T17:05:47Z) - Eliminating Contextual Prior Bias for Semantic Image Editing via
Dual-Cycle Diffusion [35.95513392917737]
A novel approach called Dual-Cycle Diffusion generates an unbiased mask to guide image editing.
Our experiments demonstrate the effectiveness of the proposed method, as it significantly improves the D-CLIP score from 0.272 to 0.283.
arXiv Detail & Related papers (2023-02-05T14:30:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.