Related papers: TDEdit: A Unified Diffusion Framework for Text-Drag Guided Image Manipulation

TDEdit: A Unified Diffusion Framework for Text-Drag Guided Image Manipulation

URL: http://arxiv.org/abs/2509.21905v1
Date: Fri, 26 Sep 2025 05:39:03 GMT
Title: TDEdit: A Unified Diffusion Framework for Text-Drag Guided Image Manipulation
Authors: Qihang Wang, Yaxiong Wang, Lechao Cheng, Zhun Zhong,
Abstract summary: We propose a unified diffusion-based framework for joint drag-text image editing.<n>Our framework introduces two key innovations: (1) Point-Cloud Deterministic Drag, which enhances latent-space layout control through 3D feature mapping, and (2) Drag-Text Guided Denoising, dynamically balancing the influence of drag and text conditions during denoising.
Score: 51.72432192816058
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper explores image editing under the joint control of text and drag interactions. While recent advances in text-driven and drag-driven editing have achieved remarkable progress, they suffer from complementary limitations: text-driven methods excel in texture manipulation but lack precise spatial control, whereas drag-driven approaches primarily modify shape and structure without fine-grained texture guidance. To address these limitations, we propose a unified diffusion-based framework for joint drag-text image editing, integrating the strengths of both paradigms. Our framework introduces two key innovations: (1) Point-Cloud Deterministic Drag, which enhances latent-space layout control through 3D feature mapping, and (2) Drag-Text Guided Denoising, dynamically balancing the influence of drag and text conditions during denoising. Notably, our model supports flexible editing modes - operating with text-only, drag-only, or combined conditions - while maintaining strong performance in each setting. Extensive quantitative and qualitative experiments demonstrate that our method not only achieves high-fidelity joint editing but also matches or surpasses the performance of specialized text-only or drag-only approaches, establishing a versatile and generalizable solution for controllable image manipulation. Code will be made publicly available to reproduce all results presented in this work.

Related papers

DirectDrag: High-Fidelity, Mask-Free, Prompt-Free Drag-based Image Editing via Readout-Guided Feature Alignment [21.951152436940536]
Drag-based image editing using generative models provides intuitive control over image structures.<n>Existing methods rely heavily on manually provided masks and textual prompts to preserve semantic fidelity and motion precision.<n>We propose DirectDrag, a novel mask- and prompt-free editing framework.
arXiv Detail & Related papers (2025-12-03T17:12:00Z)
SliderEdit: Continuous Image Editing with Fine-Grained Instruction Control [50.76070785417023]
We introduce SliderEdit, a framework for continuous image editing with fine-grained, interpretable instruction control.<n>Given a multi-part edit instruction, SliderEdit disentangles the individual instructions and exposes each as a globally trained slider.<n>Our results pave the way for interactive, instruction-driven image manipulation with continuous and compositional control.
arXiv Detail & Related papers (2025-11-12T20:21:37Z)
Kontinuous Kontext: Continuous Strength Control for Instruction-based Image Editing [76.44219733285898]
Kontinuous Kontext is an instruction-driven editing model that provides a new dimension of control over edit strength.<n>A lightweight projector network maps the input scalar and the edit instruction to coefficients in the model's modulation space.<n>For training our model, we synthesize a diverse dataset of image-edit-instruction-strength quadruplets using existing generative models.
arXiv Detail & Related papers (2025-10-09T17:51:03Z)
SAEdit: Token-level control for continuous image editing via Sparse AutoEncoder [52.754326452329956]
We introduce a method for disentangled and continuous editing through token-level manipulation of text embeddings.<n>The edits are applied by manipulating the embeddings along carefully chosen directions, which control the strength of the target attribute.<n>Our method operates directly on text embeddings without modifying the diffusion process, making it model agnostic and broadly applicable to various image backbones.
arXiv Detail & Related papers (2025-10-06T17:51:04Z)
DCEdit: Dual-Level Controlled Image Editing via Precisely Localized Semantics [71.78350994830885]
We present a novel approach to improving text-guided image editing using diffusion-based models.<n>Our method uses visual and textual self-attention to enhance the cross-attention map, which can serve as a regional cues to improve editing performance.<n>To fully compare our methods with other DiT-based approaches, we construct the RW-800 benchmark, featuring high resolution images, long descriptive texts, real-world images, and a new text editing task.
arXiv Detail & Related papers (2025-03-21T02:14:03Z)
CLIPDrag: Combining Text-based and Drag-based Instructions for Image Editing [9.398831289389749]
We propose textbfCLIPDrag, a novel image editing method that combines text and drag signals for precise and ambiguity-free manipulations.<n>CLIPDrag outperforms existing single drag-based methods or text-based methods.
arXiv Detail & Related papers (2024-10-04T02:46:09Z)
DragText: Rethinking Text Embedding in Point-based Image Editing [3.4248731707266264]
Point-based image editing enables accurate and flexible control through content dragging.<n>The role of text embedding during the editing process has not been thoroughly investigated.<n>We propose DragText, which optimize text embedding in conjunction with the dragging process to pair with the modified image embedding.
arXiv Detail & Related papers (2024-07-25T07:57:55Z)
DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing [66.43179841884098]
Large-scale Text-to-Image (T2I) diffusion models have revolutionized image generation over the last few years. We propose DiffEditor to rectify two weaknesses in existing diffusion-based image editing. Our method can efficiently achieve state-of-the-art performance on various fine-grained image editing tasks.
arXiv Detail & Related papers (2024-02-04T18:50:29Z)
Unified Diffusion-Based Rigid and Non-Rigid Editing with Text and Image Guidance [15.130419159003816]
We present a versatile image editing framework capable of executing both rigid and non-rigid edits. We leverage a dual-path injection scheme to handle diverse editing scenarios. We introduce an integrated self-attention mechanism for fusion of appearance and structural information.
arXiv Detail & Related papers (2024-01-04T08:21:30Z)
Optimisation-Based Multi-Modal Semantic Image Editing [58.496064583110694]
We propose an inference-time editing optimisation to accommodate multiple editing instruction types. By allowing to adjust the influence of each loss function, we build a flexible editing solution that can be adjusted to user preferences. We evaluate our method using text, pose and scribble edit conditions, and highlight our ability to achieve complex edits.
arXiv Detail & Related papers (2023-11-28T15:31:11Z)
StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing [115.49488548588305]
A significant research effort is focused on exploiting the amazing capacities of pretrained diffusion models for the editing of images.<n>They either finetune the model, or invert the image in the latent space of the pretrained model.<n>They suffer from two problems: Unsatisfying results for selected regions and unexpected changes in non-selected regions.
arXiv Detail & Related papers (2023-03-28T00:16:45Z)
DE-Net: Dynamic Text-guided Image Editing Adversarial Networks [82.67199573030513]
We propose a Dynamic Editing Block (DEBlock) which combines spatial- and channel-wise manipulations dynamically for various editing requirements. Our DE-Net achieves excellent performance and manipulates source images more effectively and accurately.
arXiv Detail & Related papers (2022-06-02T17:20:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.