RegionDrag: Fast Region-Based Image Editing with Diffusion Models
- URL: http://arxiv.org/abs/2407.18247v1
- Date: Thu, 25 Jul 2024 17:59:13 GMT
- Title: RegionDrag: Fast Region-Based Image Editing with Diffusion Models
- Authors: Jingyi Lu, Xinghui Li, Kai Han,
- Abstract summary: RegionDrag is a copy-and-paste dragging method that allows users to express their editing instructions in the form of handle and target regions.
RegionDrag completes the edit on an image with a resolution of 512x512 in less than 2 seconds, which is more than 100x faster than DragDiffusion.
- Score: 14.65208340413507
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Point-drag-based image editing methods, like DragDiffusion, have attracted significant attention. However, point-drag-based approaches suffer from computational overhead and misinterpretation of user intentions due to the sparsity of point-based editing instructions. In this paper, we propose a region-based copy-and-paste dragging method, RegionDrag, to overcome these limitations. RegionDrag allows users to express their editing instructions in the form of handle and target regions, enabling more precise control and alleviating ambiguity. In addition, region-based operations complete editing in one iteration and are much faster than point-drag-based methods. We also incorporate the attention-swapping technique for enhanced stability during editing. To validate our approach, we extend existing point-drag-based datasets with region-based dragging instructions. Experimental results demonstrate that RegionDrag outperforms existing point-drag-based approaches in terms of speed, accuracy, and alignment with user intentions. Remarkably, RegionDrag completes the edit on an image with a resolution of 512x512 in less than 2 seconds, which is more than 100x faster than DragDiffusion, while achieving better performance. Project page: https://visual-ai.github.io/regiondrag.
Related papers
- ContextDrag: Precise Drag-Based Image Editing via Context-Preserving Token Injection and Position-Consistent Attention [81.12932992203885]
We introduce ContextDrag, a new paradigm for drag-based editing.<n>By incorporating VAE-encoded features from the reference image, ContextDrag can leverage rich contextual cues and preserve fine-grained details.
arXiv Detail & Related papers (2025-12-09T10:51:45Z) - DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing [19.031261008813644]
This work proposes the first framework to effectively harness FLUX's rich prior for drag-based editing, dubbed DragFlow.<n>To overcome this limitation, DragFlow introduces a region-based editing paradigm, where affine transformations enable richer and more consistent feature supervision.<n>Experiments on DragBench-DR and ReD Bench show that DragFlow surpasses both point-based and region-based baselines.
arXiv Detail & Related papers (2025-10-02T17:39:13Z) - DragNeXt: Rethinking Drag-Based Image Editing [81.9430401732008]
Drag-Based Image Editing (DBIE) allows users to manipulate images by directly dragging objects within them.<n>It faces two key challenges: (emphtextcolormagentaii) point-based drag is often highly ambiguous and difficult to align with users' intentions.<n>We propose a simple-yet-effective editing framework, dubbed textcolorSkyBluetextbfDragNeXt.
arXiv Detail & Related papers (2025-06-09T10:24:29Z) - DCEdit: Dual-Level Controlled Image Editing via Precisely Localized Semantics [71.78350994830885]
We present a novel approach to improving text-guided image editing using diffusion-based models.
Our method uses visual and textual self-attention to enhance the cross-attention map, which can serve as a regional cues to improve editing performance.
To fully compare our methods with other DiT-based approaches, we construct the RW-800 benchmark, featuring high resolution images, long descriptive texts, real-world images, and a new text editing task.
arXiv Detail & Related papers (2025-03-21T02:14:03Z) - EEdit: Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing [48.05033786803384]
We propose a framework, named EEdit, to achieve efficient image editing.
Experiments demonstrate an average of 2.46 $times$ acceleration without performance drop in a wide range of editing tasks.
arXiv Detail & Related papers (2025-03-13T11:26:45Z) - Drag Your Gaussian: Effective Drag-Based Editing with Score Distillation for 3D Gaussian Splatting [55.14822004410817]
We introduce DYG, an effective 3D drag-based editing method for 3D Gaussian Splatting.
It enables precise control over the extent of editing through the input of 3D masks and pairs of control points.
DYG integrates the strengths of the implicit triplane representation to establish the geometric scaffold of the editing results.
arXiv Detail & Related papers (2025-01-30T18:51:54Z) - Combing Text-based and Drag-based Editing for Precise and Flexible Image Editing [9.398831289389749]
We propose textbfCLIPDrag, a novel image editing method that combines text and drag signals for precise and ambiguity-free manipulations.
CLIPDrag outperforms existing single drag-based methods or text-based methods.
arXiv Detail & Related papers (2024-10-04T02:46:09Z) - FastDrag: Manipulate Anything in One Step [20.494157877241665]
We introduce a novel one-step drag-based image editing method, i.e., FastDrag, to accelerate the editing process.
This innovation achieves one-step latent semantic optimization and hence significantly promotes editing speeds.
Our FastDrag is validated on the DragBench dataset, demonstrating substantial improvements in processing time over existing methods.
arXiv Detail & Related papers (2024-05-24T17:59:26Z) - LightningDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos [101.59710862476041]
We present LightningDrag, a rapid approach enabling high quality drag-based image editing in 1 second.
Unlike most previous methods, we redefine drag-based editing as a conditional generation task.
Our approach can significantly outperform previous methods in terms of accuracy and consistency.
arXiv Detail & Related papers (2024-05-22T15:14:00Z) - GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models [31.708968272342315]
We introduce GoodDrag, a novel approach to improve the stability and image quality of drag editing.
GoodDrag introduces an AlDD framework that alternates between drag and denoising operations within the diffusion process.
We also propose an information-preserving motion supervision operation that maintains the original features of the starting point for precise manipulation and artifact reduction.
arXiv Detail & Related papers (2024-04-10T17:59:59Z) - DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image
Editing [66.43179841884098]
Large-scale Text-to-Image (T2I) diffusion models have revolutionized image generation over the last few years.
We propose DiffEditor to rectify two weaknesses in existing diffusion-based image editing.
Our method can efficiently achieve state-of-the-art performance on various fine-grained image editing tasks.
arXiv Detail & Related papers (2024-02-04T18:50:29Z) - Object-Centric Diffusion for Efficient Video Editing [64.71639719352636]
Diffusion-based video editing has reached impressive quality.
Such solutions typically incur heavy memory and computational costs to generate temporally-coherent frames.
We propose modifications that allow significant speed-ups whilst maintaining quality.
arXiv Detail & Related papers (2024-01-11T08:36:15Z) - ZONE: Zero-Shot Instruction-Guided Local Editing [56.56213730578504]
We propose a Zero-shot instructiON-guided local image Editing approach, termed ZONE.
We first convert the editing intent from the user-provided instruction into specific image editing regions through InstructPix2Pix.
We then propose a Region-IoU scheme for precise image layer extraction from an off-the-shelf segment model.
arXiv Detail & Related papers (2023-12-28T02:54:34Z) - DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing [94.24479528298252]
DragGAN is an interactive point-based image editing framework that achieves impressive editing results with pixel-level precision.
By harnessing large-scale pretrained diffusion models, we greatly enhance the applicability of interactive point-based editing on both real and diffusion-generated images.
We present a challenging benchmark dataset called DragBench to evaluate the performance of interactive point-based image editing methods.
arXiv Detail & Related papers (2023-06-26T06:04:09Z) - Region-Aware Diffusion for Zero-shot Text-driven Image Editing [78.58917623854079]
We propose a novel region-aware diffusion model (RDM) for entity-level image editing.
To strike a balance between image fidelity and inference speed, we design the intensive diffusion pipeline.
The results show that RDM outperforms the previous approaches in terms of visual quality, overall harmonization, non-editing region content preservation, and text-image semantic consistency.
arXiv Detail & Related papers (2023-02-23T06:20:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.