LightningDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos
- URL: http://arxiv.org/abs/2405.13722v2
- Date: Mon, 16 Sep 2024 04:25:50 GMT
- Title: LightningDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos
- Authors: Yujun Shi, Jun Hao Liew, Hanshu Yan, Vincent Y. F. Tan, Jiashi Feng,
- Abstract summary: We present LightningDrag, a rapid approach enabling high quality drag-based image editing in 1 second.
Unlike most previous methods, we redefine drag-based editing as a conditional generation task.
Our approach can significantly outperform previous methods in terms of accuracy and consistency.
- Score: 101.59710862476041
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accuracy and speed are critical in image editing tasks. Pan et al. introduced a drag-based image editing framework that achieves pixel-level control using Generative Adversarial Networks (GANs). A flurry of subsequent studies enhanced this framework's generality by leveraging large-scale diffusion models. However, these methods often suffer from inordinately long processing times (exceeding 1 minute per edit) and low success rates. Addressing these issues head on, we present LightningDrag, a rapid approach enabling high quality drag-based image editing in ~1 second. Unlike most previous methods, we redefine drag-based editing as a conditional generation task, eliminating the need for time-consuming latent optimization or gradient-based guidance during inference. In addition, the design of our pipeline allows us to train our model on large-scale paired video frames, which contain rich motion information such as object translations, changing poses and orientations, zooming in and out, etc. By learning from videos, our approach can significantly outperform previous methods in terms of accuracy and consistency. Despite being trained solely on videos, our model generalizes well to perform local shape deformations not presented in the training data (e.g., lengthening of hair, twisting rainbows, etc.). Extensive qualitative and quantitative evaluations on benchmark datasets corroborate the superiority of our approach. The code and model will be released at https://github.com/magic-research/LightningDrag.
Related papers
- Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors [54.8852848659663]
Buffer Anytime is a framework for estimation of depth and normal maps (which we call geometric buffers) from video.
We demonstrate high-quality video buffer estimation by leveraging single-image priors with temporal consistency constraints.
arXiv Detail & Related papers (2024-11-26T09:28:32Z) - Taming Rectified Flow for Inversion and Editing [57.3742655030493]
Rectified-flow-based diffusion transformers, such as FLUX and OpenSora, have demonstrated exceptional performance in the field of image and video generation.
Despite their robust generative capabilities, these models often suffer from inaccurate inversion, which could limit their effectiveness in downstream tasks such as image and video editing.
We propose RF-r, a novel training-free sampler that enhances inversion precision by reducing errors in the process of solving rectified flow ODEs.
arXiv Detail & Related papers (2024-11-07T14:29:02Z) - Auto DragGAN: Editing the Generative Image Manifold in an Autoregressive Manner [28.939227214483953]
This paper employs a regression-based network to learn the variation patterns of StyleGAN latent codes during the image dragging process.
We show that our method achieves state-of-the-art (SOTA) inference speed and image editing performance at the pixel-level granularity.
arXiv Detail & Related papers (2024-07-26T10:45:57Z) - FastDrag: Manipulate Anything in One Step [20.494157877241665]
We introduce a novel one-step drag-based image editing method, i.e., FastDrag, to accelerate the editing process.
This innovation achieves one-step latent semantic optimization and hence significantly promotes editing speeds.
Our FastDrag is validated on the DragBench dataset, demonstrating substantial improvements in processing time over existing methods.
arXiv Detail & Related papers (2024-05-24T17:59:26Z) - VASE: Object-Centric Appearance and Shape Manipulation of Real Videos [108.60416277357712]
In this work, we introduce a framework that is object-centric and is designed to control both the object's appearance and, notably, to execute precise and explicit structural modifications on the object.
We build our framework on a pre-trained image-conditioned diffusion model, integrate layers to handle the temporal dimension, and propose training strategies and architectural modifications to enable shape control.
We evaluate our method on the image-driven video editing task showing similar performance to the state-of-the-art, and showcasing novel shape-editing capabilities.
arXiv Detail & Related papers (2024-01-04T18:59:24Z) - DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models [66.43179841884098]
We propose a novel image editing method, DragonDiffusion, enabling Drag-style manipulation on Diffusion models.
Our method achieves various editing modes for the generated or real images, such as object moving, object resizing, object appearance replacement, and content dragging.
arXiv Detail & Related papers (2023-07-05T16:43:56Z) - DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing [94.24479528298252]
DragGAN is an interactive point-based image editing framework that achieves impressive editing results with pixel-level precision.
By harnessing large-scale pretrained diffusion models, we greatly enhance the applicability of interactive point-based editing on both real and diffusion-generated images.
We present a challenging benchmark dataset called DragBench to evaluate the performance of interactive point-based image editing methods.
arXiv Detail & Related papers (2023-06-26T06:04:09Z) - Pix2Video: Video Editing using Image Diffusion [43.07444438561277]
We investigate how to use pre-trained image models for text-guided video editing.
Our method works in two simple steps: first, we use a pre-trained structure-guided (e.g., depth) image diffusion model to perform text-guided edits on an anchor frame.
We demonstrate that realistic text-guided video edits are possible, without any compute-intensive preprocessing or video-specific finetuning.
arXiv Detail & Related papers (2023-03-22T16:36:10Z) - Temporally Consistent Semantic Video Editing [44.50322018842475]
We present a simple yet effective method to facilitate temporally coherent video editing.
Our core idea is to minimize the temporal photometric inconsistency by optimizing both the latent code and the pre-trained generator.
arXiv Detail & Related papers (2022-06-21T17:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.