DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing
- URL: http://arxiv.org/abs/2306.14435v6
- Date: Sun, 7 Apr 2024 18:04:04 GMT
- Title: DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing
- Authors: Yujun Shi, Chuhui Xue, Jun Hao Liew, Jiachun Pan, Hanshu Yan, Wenqing Zhang, Vincent Y. F. Tan, Song Bai,
- Abstract summary: DragGAN is an interactive point-based image editing framework that achieves impressive editing results with pixel-level precision.
By harnessing large-scale pretrained diffusion models, we greatly enhance the applicability of interactive point-based editing on both real and diffusion-generated images.
We present a challenging benchmark dataset called DragBench to evaluate the performance of interactive point-based image editing methods.
- Score: 94.24479528298252
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate and controllable image editing is a challenging task that has attracted significant attention recently. Notably, DragGAN is an interactive point-based image editing framework that achieves impressive editing results with pixel-level precision. However, due to its reliance on generative adversarial networks (GANs), its generality is limited by the capacity of pretrained GAN models. In this work, we extend this editing framework to diffusion models and propose a novel approach DragDiffusion. By harnessing large-scale pretrained diffusion models, we greatly enhance the applicability of interactive point-based editing on both real and diffusion-generated images. Our approach involves optimizing the diffusion latents to achieve precise spatial control. The supervision signal of this optimization process is from the diffusion model's UNet features, which are known to contain rich semantic and geometric information. Moreover, we introduce two additional techniques, namely LoRA fine-tuning and latent-MasaCtrl, to further preserve the identity of the original image. Lastly, we present a challenging benchmark dataset called DragBench -- the first benchmark to evaluate the performance of interactive point-based image editing methods. Experiments across a wide range of challenging cases (e.g., images with multiple objects, diverse object categories, various styles, etc.) demonstrate the versatility and generality of DragDiffusion. Code: https://github.com/Yujun-Shi/DragDiffusion.
Related papers
- A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models [117.77807994397784]
Image editing aims to edit the given synthetic or real image to meet the specific requirements from users.
Recent significant advancement in this field is based on the development of text-to-image (T2I) diffusion models.
T2I-based image editing methods significantly enhance editing performance and offer a user-friendly interface for modifying content guided by multimodal inputs.
arXiv Detail & Related papers (2024-06-20T17:58:52Z) - The Curious Case of End Token: A Zero-Shot Disentangled Image Editing using CLIP [4.710921988115686]
We show that CLIP is capable of performing disentangled editing in a zero-shot manner.
This insight may open opportunities for applying this method to various tasks, including image and video editing.
arXiv Detail & Related papers (2024-06-01T14:46:57Z) - GANTASTIC: GAN-based Transfer of Interpretable Directions for Disentangled Image Editing in Text-to-Image Diffusion Models [4.710921988115686]
We introduce GANTASTIC, a novel framework that takes existing directions from pre-trained GAN models and transfers these directions into diffusion-based models.
This approach not only maintains the generative quality and diversity that diffusion models are known for but also significantly enhances their capability to perform precise, targeted image edits.
arXiv Detail & Related papers (2024-03-28T17:55:16Z) - DiffMorpher: Unleashing the Capability of Diffusion Models for Image
Morphing [28.593023489682654]
We present DiffMorpher, the first approach enabling smooth and natural image morphing using diffusion models.
Our key idea is to capture the semantics of the two images by fitting two LoRAs to them respectively, and interpolate between both the LoRA parameters and the latent noises to ensure a smooth semantic transition.
In addition, we propose an attention and injection technique and a new sampling schedule to further enhance the smoothness between consecutive images.
arXiv Detail & Related papers (2023-12-12T16:28:08Z) - Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis [60.260724486834164]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries.
We present two key innovations: Vision Guidance and the Layered Rendering Diffusion framework.
We apply our method to three practical applications: bounding box-to-image, semantic mask-to-image and image editing.
arXiv Detail & Related papers (2023-11-30T10:36:19Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models [66.43179841884098]
We propose a novel image editing method, DragonDiffusion, enabling Drag-style manipulation on Diffusion models.
Our method achieves various editing modes for the generated or real images, such as object moving, object resizing, object appearance replacement, and content dragging.
arXiv Detail & Related papers (2023-07-05T16:43:56Z) - Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold [79.94300820221996]
DragGAN is a new way of controlling generative adversarial networks (GANs)
DragGAN allows anyone to deform an image with precise control over where pixels go, thus manipulating the pose, shape, expression, and layout of diverse categories such as animals, cars, humans, landscapes, etc.
Both qualitative and quantitative comparisons demonstrate the advantage of DragGAN over prior approaches in the tasks of image manipulation and point tracking.
arXiv Detail & Related papers (2023-05-18T13:41:25Z) - Blended Latent Diffusion [18.043090347648157]
We present an accelerated solution to the task of local text-driven editing of generic images, where the desired edits are confined to a user-provided mask.
Our solution leverages a recent text-to-image Latent Diffusion Model (LDM), which speeds up diffusion by operating in a lower-dimensional latent space.
arXiv Detail & Related papers (2022-06-06T17:58:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.