Related papers: ContextDrag: Precise Drag-Based Image Editing via Context-Preserving Token Injection and Position-Consistent Attention

ContextDrag: Precise Drag-Based Image Editing via Context-Preserving Token Injection and Position-Consistent Attention

URL: http://arxiv.org/abs/2512.08477v1
Date: Tue, 09 Dec 2025 10:51:45 GMT
Title: ContextDrag: Precise Drag-Based Image Editing via Context-Preserving Token Injection and Position-Consistent Attention
Authors: Huiguo He, Pengyu Yan, Ziqi Yi, Weizhi Zhong, Zheng Liu, Yejun Tang, Huan Yang, Kun Gai, Guanbin Li, Lianwen Jin,
Abstract summary: We introduce ContextDrag, a new paradigm for drag-based editing.<n>By incorporating VAE-encoded features from the reference image, ContextDrag can leverage rich contextual cues and preserve fine-grained details.
Score: 81.12932992203885
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Drag-based image editing aims to modify visual content followed by user-specified drag operations. Despite existing methods having made notable progress, they still fail to fully exploit the contextual information in the reference image, including fine-grained texture details, leading to edits with limited coherence and fidelity. To address this challenge, we introduce ContextDrag, a new paradigm for drag-based editing that leverages the strong contextual modeling capability of editing models, such as FLUX-Kontext. By incorporating VAE-encoded features from the reference image, ContextDrag can leverage rich contextual cues and preserve fine-grained details, without the need for finetuning or inversion. Specifically, ContextDrag introduced a novel Context-preserving Token Injection (CTI) that injects noise-free reference features into their correct destination locations via a Latent-space Reverse Mapping (LRM) algorithm. This strategy enables precise drag control while preserving consistency in both semantics and texture details. Second, ContextDrag adopts a novel Position-Consistent Attention (PCA), which positional re-encodes the reference tokens and applies overlap-aware masking to eliminate interference from irrelevant reference features. Extensive experiments on DragBench-SR and DragBench-DR demonstrate that our approach surpasses all existing SOTA methods. Code will be publicly available.

Related papers

DirectDrag: High-Fidelity, Mask-Free, Prompt-Free Drag-based Image Editing via Readout-Guided Feature Alignment [21.951152436940536]
Drag-based image editing using generative models provides intuitive control over image structures.<n>Existing methods rely heavily on manually provided masks and textual prompts to preserve semantic fidelity and motion precision.<n>We propose DirectDrag, a novel mask- and prompt-free editing framework.
arXiv Detail & Related papers (2025-12-03T17:12:00Z)
InstructUDrag: Joint Text Instructions and Object Dragging for Interactive Image Editing [6.95116998047811]
InstructUDrag is a diffusion-based framework that combines text instructions with object dragging.<n>Our framework treats object dragging as an image reconstruction process, divided into two synergistic branches.<n>InstructUDrag facilitates flexible, high-fidelity image editing, offering both precision in object relocation and semantic control over image content.
arXiv Detail & Related papers (2025-10-09T13:06:49Z)
Streaming Drag-Oriented Interactive Video Manipulation: Drag Anything, Anytime! [88.12304235156591]
We propose textbfstReaming drag-oriEnted interactiVe vidEo manipuLation (REVEL), a new task that enables users to modify generated videos emphanytime on emphanything via fine-grained, interactive drag.<n>Our method can be seamlessly integrated into existing autoregressive video diffusion models.
arXiv Detail & Related papers (2025-10-03T22:38:35Z)
DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing [19.031261008813644]
This work proposes the first framework to effectively harness FLUX's rich prior for drag-based editing, dubbed DragFlow.<n>To overcome this limitation, DragFlow introduces a region-based editing paradigm, where affine transformations enable richer and more consistent feature supervision.<n>Experiments on DragBench-DR and ReD Bench show that DragFlow surpasses both point-based and region-based baselines.
arXiv Detail & Related papers (2025-10-02T17:39:13Z)
TDEdit: A Unified Diffusion Framework for Text-Drag Guided Image Manipulation [51.72432192816058]
We propose a unified diffusion-based framework for joint drag-text image editing.<n>Our framework introduces two key innovations: (1) Point-Cloud Deterministic Drag, which enhances latent-space layout control through 3D feature mapping, and (2) Drag-Text Guided Denoising, dynamically balancing the influence of drag and text conditions during denoising.
arXiv Detail & Related papers (2025-09-26T05:39:03Z)
LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence [31.686266704795273]
We introduce LazyDrag, the first drag-based image editing method for Multi-Modal Diffusion Transformers.<n>LazyDrag directly eliminates the reliance on implicit point matching.<n>It unifies precise geometric control with text guidance, enabling complex edits that were previously out of reach.
arXiv Detail & Related papers (2025-09-15T17:59:47Z)
DragNeXt: Rethinking Drag-Based Image Editing [81.9430401732008]
Drag-Based Image Editing (DBIE) allows users to manipulate images by directly dragging objects within them.<n>It faces two key challenges: (emphtextcolormagentaii) point-based drag is often highly ambiguous and difficult to align with users' intentions.<n>We propose a simple-yet-effective editing framework, dubbed textcolorSkyBluetextbfDragNeXt.
arXiv Detail & Related papers (2025-06-09T10:24:29Z)
FastDrag: Manipulate Anything in One Step [20.494157877241665]
We introduce a novel one-step drag-based image editing method, i.e., FastDrag, to accelerate the editing process. This innovation achieves one-step latent semantic optimization and hence significantly promotes editing speeds. Our FastDrag is validated on the DragBench dataset, demonstrating substantial improvements in processing time over existing methods.
arXiv Detail & Related papers (2024-05-24T17:59:26Z)
DM-Align: Leveraging the Power of Natural Language Instructions to Make Changes to Images [55.546024767130994]
We propose a novel model to enhance the text-based control of an image editor by explicitly reasoning about which parts of the image to alter or preserve. It relies on word alignments between a description of the original source image and the instruction that reflects the needed updates, and the input image. It is evaluated on a subset of the Bison dataset and a self-defined dataset dubbed Dream.
arXiv Detail & Related papers (2024-04-27T22:45:47Z)
DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models [66.43179841884098]
We propose a novel image editing method, DragonDiffusion, enabling Drag-style manipulation on Diffusion models. Our method achieves various editing modes for the generated or real images, such as object moving, object resizing, object appearance replacement, and content dragging.
arXiv Detail & Related papers (2023-07-05T16:43:56Z)
iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing. It generates images conditioned on a source image and a textual edit prompt. It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.