DirectDrag: High-Fidelity, Mask-Free, Prompt-Free Drag-based Image Editing via Readout-Guided Feature Alignment
- URL: http://arxiv.org/abs/2512.03981v1
- Date: Wed, 03 Dec 2025 17:12:00 GMT
- Title: DirectDrag: High-Fidelity, Mask-Free, Prompt-Free Drag-based Image Editing via Readout-Guided Feature Alignment
- Authors: Sheng-Hao Liao, Shang-Fu Chen, Tai-Ming Huang, Wen-Huang Cheng, Kai-Lung Hua,
- Abstract summary: Drag-based image editing using generative models provides intuitive control over image structures.<n>Existing methods rely heavily on manually provided masks and textual prompts to preserve semantic fidelity and motion precision.<n>We propose DirectDrag, a novel mask- and prompt-free editing framework.
- Score: 21.951152436940536
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Drag-based image editing using generative models provides intuitive control over image structures. However, existing methods rely heavily on manually provided masks and textual prompts to preserve semantic fidelity and motion precision. Removing these constraints creates a fundamental trade-off: visual artifacts without masks and poor spatial control without prompts. To address these limitations, we propose DirectDrag, a novel mask- and prompt-free editing framework. DirectDrag enables precise and efficient manipulation with minimal user input while maintaining high image fidelity and accurate point alignment. DirectDrag introduces two key innovations. First, we design an Auto Soft Mask Generation module that intelligently infers editable regions from point displacement, automatically localizing deformation along movement paths while preserving contextual integrity through the generative model's inherent capacity. Second, we develop a Readout-Guided Feature Alignment mechanism that leverages intermediate diffusion activations to maintain structural consistency during point-based edits, substantially improving visual fidelity. Despite operating without manual mask or prompt, DirectDrag achieves superior image quality compared to existing methods while maintaining competitive drag accuracy. Extensive experiments on DragBench and real-world scenarios demonstrate the effectiveness and practicality of DirectDrag for high-quality, interactive image manipulation. Project Page: https://frakw.github.io/DirectDrag/. Code is available at: https://github.com/frakw/DirectDrag.
Related papers
- ContextDrag: Precise Drag-Based Image Editing via Context-Preserving Token Injection and Position-Consistent Attention [81.12932992203885]
We introduce ContextDrag, a new paradigm for drag-based editing.<n>By incorporating VAE-encoded features from the reference image, ContextDrag can leverage rich contextual cues and preserve fine-grained details.
arXiv Detail & Related papers (2025-12-09T10:51:45Z) - LoVoRA: Text-guided and Mask-free Video Object Removal and Addition with Learnable Object-aware Localization [49.945233586949286]
LoVoRA is a novel framework for mask-free video object removal and addition.<n>Our approach integrates image-to-video translation, optical flow-based mask propagation, and videopainting, enabling temporally consistent edits.<n>LoVoRA achieves end-to-end video editing without requiring external control signals during inference.
arXiv Detail & Related papers (2025-12-02T17:01:07Z) - SAEdit: Token-level control for continuous image editing via Sparse AutoEncoder [52.754326452329956]
We introduce a method for disentangled and continuous editing through token-level manipulation of text embeddings.<n>The edits are applied by manipulating the embeddings along carefully chosen directions, which control the strength of the target attribute.<n>Our method operates directly on text embeddings without modifying the diffusion process, making it model agnostic and broadly applicable to various image backbones.
arXiv Detail & Related papers (2025-10-06T17:51:04Z) - TDEdit: A Unified Diffusion Framework for Text-Drag Guided Image Manipulation [51.72432192816058]
We propose a unified diffusion-based framework for joint drag-text image editing.<n>Our framework introduces two key innovations: (1) Point-Cloud Deterministic Drag, which enhances latent-space layout control through 3D feature mapping, and (2) Drag-Text Guided Denoising, dynamically balancing the influence of drag and text conditions during denoising.
arXiv Detail & Related papers (2025-09-26T05:39:03Z) - LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence [31.686266704795273]
We introduce LazyDrag, the first drag-based image editing method for Multi-Modal Diffusion Transformers.<n>LazyDrag directly eliminates the reliance on implicit point matching.<n>It unifies precise geometric control with text guidance, enabling complex edits that were previously out of reach.
arXiv Detail & Related papers (2025-09-15T17:59:47Z) - IntrinsicEdit: Precise generative image manipulation in intrinsic space [53.404235331886255]
We introduce a versatile, generative workflow that operates in an intrinsic-image latent space.<n>We address key challenges of identity preservation and intrinsic-channel entanglement.<n>We enable precise, efficient editing with automatic resolution of global illumination effects.
arXiv Detail & Related papers (2025-05-13T18:24:15Z) - BrushEdit: All-In-One Image Inpainting and Editing [76.93556996538398]
BrushEdit is a novel inpainting-based instruction-guided image editing paradigm.<n>We devise a system enabling free-form instruction editing by integrating MLLMs and a dual-branch image inpainting model.<n>Our framework effectively combines MLLMs and inpainting models, achieving superior performance across seven metrics.
arXiv Detail & Related papers (2024-12-13T17:58:06Z) - InstantDrag: Improving Interactivity in Drag-based Image Editing [23.004027029130953]
Drag-based image editing has recently gained popularity for its interactivity and precision.<n>We introduce InstantDrag, an optimization-free pipeline that enhances interactivity and speed.<n>We demonstrate InstantDrag's capability to perform fast, photo-realistic edits without masks or text prompts.
arXiv Detail & Related papers (2024-09-13T14:19:27Z) - FastDrag: Manipulate Anything in One Step [20.494157877241665]
We introduce a novel one-step drag-based image editing method, i.e., FastDrag, to accelerate the editing process.
This innovation achieves one-step latent semantic optimization and hence significantly promotes editing speeds.
Our FastDrag is validated on the DragBench dataset, demonstrating substantial improvements in processing time over existing methods.
arXiv Detail & Related papers (2024-05-24T17:59:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.