DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models
- URL: http://arxiv.org/abs/2307.02421v2
- Date: Mon, 20 Nov 2023 09:08:42 GMT
- Title: DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models
- Authors: Chong Mou, Xintao Wang, Jiechong Song, Ying Shan, Jian Zhang
- Abstract summary: We propose a novel image editing method, DragonDiffusion, enabling Drag-style manipulation on Diffusion models.
Our method achieves various editing modes for the generated or real images, such as object moving, object resizing, object appearance replacement, and content dragging.
- Score: 66.43179841884098
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the ability of existing large-scale text-to-image (T2I) models to
generate high-quality images from detailed textual descriptions, they often
lack the ability to precisely edit the generated or real images. In this paper,
we propose a novel image editing method, DragonDiffusion, enabling Drag-style
manipulation on Diffusion models. Specifically, we construct classifier
guidance based on the strong correspondence of intermediate features in the
diffusion model. It can transform the editing signals into gradients via
feature correspondence loss to modify the intermediate representation of the
diffusion model. Based on this guidance strategy, we also build a multi-scale
guidance to consider both semantic and geometric alignment. Moreover, a
cross-branch self-attention is added to maintain the consistency between the
original image and the editing result. Our method, through an efficient design,
achieves various editing modes for the generated or real images, such as object
moving, object resizing, object appearance replacement, and content dragging.
It is worth noting that all editing and content preservation signals come from
the image itself, and the model does not require fine-tuning or additional
modules. Our source code will be available at
https://github.com/MC-E/DragonDiffusion.
Related papers
- Enhancing Text-to-Image Editing via Hybrid Mask-Informed Fusion [61.42732844499658]
This paper systematically improves the text-guided image editing techniques based on diffusion models.
We incorporate human annotation as an external knowledge to confine editing within a Mask-informed'' region.
arXiv Detail & Related papers (2024-05-24T07:53:59Z) - Editable Image Elements for Controllable Synthesis [79.58148778509769]
We propose an image representation that promotes spatial editing of input images using a diffusion model.
We show the effectiveness of our representation on various image editing tasks, such as object resizing, rearrangement, dragging, de-occlusion, removal, variation, and image composition.
arXiv Detail & Related papers (2024-04-24T17:59:11Z) - DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image
Editing [66.43179841884098]
Large-scale Text-to-Image (T2I) diffusion models have revolutionized image generation over the last few years.
We propose DiffEditor to rectify two weaknesses in existing diffusion-based image editing.
Our method can efficiently achieve state-of-the-art performance on various fine-grained image editing tasks.
arXiv Detail & Related papers (2024-02-04T18:50:29Z) - InFusion: Inject and Attention Fusion for Multi Concept Zero-Shot
Text-based Video Editing [27.661609140918916]
InFusion is a framework for zero-shot text-based video editing.
It supports editing of multiple concepts with pixel-level control over diverse concepts mentioned in the editing prompt.
Our framework is a low-cost alternative to one-shot tuned models for editing since it does not require training.
arXiv Detail & Related papers (2023-07-22T17:05:47Z) - DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing [94.24479528298252]
DragGAN is an interactive point-based image editing framework that achieves impressive editing results with pixel-level precision.
By harnessing large-scale pretrained diffusion models, we greatly enhance the applicability of interactive point-based editing on both real and diffusion-generated images.
We present a challenging benchmark dataset called DragBench to evaluate the performance of interactive point-based image editing methods.
arXiv Detail & Related papers (2023-06-26T06:04:09Z) - LayerDiffusion: Layered Controlled Image Editing with Diffusion Models [5.58892860792971]
LayerDiffusion is a semantic-based layered controlled image editing method.
We leverage a large-scale text-to-image model and employ a layered controlled optimization strategy.
Experimental results demonstrate the effectiveness of our method in generating highly coherent images.
arXiv Detail & Related papers (2023-05-30T01:26:41Z) - DiffUTE: Universal Text Editing Diffusion Model [32.384236053455]
We propose a universal self-supervised text editing diffusion model (DiffUTE)
It aims to replace or modify words in the source image with another one while maintaining its realistic appearance.
Our method achieves an impressive performance and enables controllable editing on in-the-wild images with high fidelity.
arXiv Detail & Related papers (2023-05-18T09:06:01Z) - PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor [135.17302411419834]
PAIR Diffusion is a generic framework that enables a diffusion model to control the structure and appearance of each object in the image.
We show that having control over the properties of each object in an image leads to comprehensive editing capabilities.
Our framework allows for various object-level editing operations on real images such as reference image-based appearance editing, free-form shape editing, adding objects, and variations.
arXiv Detail & Related papers (2023-03-30T17:13:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.