Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control
- URL: http://arxiv.org/abs/2508.08134v3
- Date: Sat, 04 Oct 2025 05:28:39 GMT
- Title: Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control
- Authors: Zeqian Long, Mingzhe Zheng, Kunyu Feng, Xinhua Zhang, Hongyu Liu, Harry Yang, Linfeng Zhang, Qifeng Chen, Yue Ma,
- Abstract summary: Follow-Your-Shape is a training-free and mask-free framework that supports precise and controllable editing of object shapes.<n>We compute a Trajectory Divergence Map (TDM) by comparing token-wise velocity differences between the inversion and denoising paths.<n>Our method achieves superior editability and visual fidelity, particularly in tasks requiring large-scale shape replacement.
- Score: 52.87568958372421
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While recent flow-based image editing models demonstrate general-purpose capabilities across diverse tasks, they often struggle to specialize in challenging scenarios -- particularly those involving large-scale shape transformations. When performing such structural edits, these methods either fail to achieve the intended shape change or inadvertently alter non-target regions, resulting in degraded background quality. We propose Follow-Your-Shape, a training-free and mask-free framework that supports precise and controllable editing of object shapes while strictly preserving non-target content. Motivated by the divergence between inversion and editing trajectories, we compute a Trajectory Divergence Map (TDM) by comparing token-wise velocity differences between the inversion and denoising paths. The TDM enables precise localization of editable regions and guides a Scheduled KV Injection mechanism that ensures stable and faithful editing. To facilitate a rigorous evaluation, we introduce ReShapeBench, a new benchmark comprising 120 new images and enriched prompt pairs specifically curated for shape-aware editing. Experiments demonstrate that our method achieves superior editability and visual fidelity, particularly in tasks requiring large-scale shape replacement.
Related papers
- Kontinuous Kontext: Continuous Strength Control for Instruction-based Image Editing [76.44219733285898]
Kontinuous Kontext is an instruction-driven editing model that provides a new dimension of control over edit strength.<n>A lightweight projector network maps the input scalar and the edit instruction to coefficients in the model's modulation space.<n>For training our model, we synthesize a diverse dataset of image-edit-instruction-strength quadruplets using existing generative models.
arXiv Detail & Related papers (2025-10-09T17:51:03Z) - Training-free Geometric Image Editing on Diffusion Models [53.38549950608886]
We tackle the task of geometric image editing, where an object within an image is repositioned, reoriented, or reshaped.<n>We propose a decoupled pipeline that separates object transformation, source region inpainting, and target region refinement.<n>Both inpainting and refinement are implemented using a training-free diffusion approach, FreeFine.
arXiv Detail & Related papers (2025-07-31T07:36:00Z) - CPAM: Context-Preserving Adaptive Manipulation for Zero-Shot Real Image Editing [24.68304617869157]
Context-Preserving Adaptive Manipulation (CPAM) is a novel framework for complicated, non-rigid real image editing.<n>We develop a preservation adaptation module that adjusts self-attention mechanisms to preserve and independently control the object and background effectively.<n>We also introduce various mask-guidance strategies to facilitate diverse image manipulation tasks in a simple manner.
arXiv Detail & Related papers (2025-06-23T09:19:38Z) - Image Editing As Programs with Diffusion Models [69.05164729625052]
We introduce Image Editing As Programs (IEAP), a unified image editing framework built upon the Diffusion Transformer (DiT) architecture.<n>IEAP approaches instructional editing through a reductionist lens, decomposing complex editing instructions into sequences of atomic operations.<n>Our framework delivers superior accuracy and semantic fidelity, particularly for complex, multi-step instructions.
arXiv Detail & Related papers (2025-06-04T16:57:24Z) - Training-Free Text-Guided Image Editing with Visual Autoregressive Model [46.201510044410995]
We propose a novel text-guided image editing framework based on Visual AutoRegressive modeling.<n>Our method eliminates the need for explicit inversion while ensuring precise and controlled modifications.<n>Our framework operates in a training-free manner and achieves high-fidelity editing with faster inference speeds.
arXiv Detail & Related papers (2025-03-31T09:46:56Z) - Uniform Attention Maps: Boosting Image Fidelity in Reconstruction and Editing [66.48853049746123]
We analyze reconstruction from a structural perspective and propose a novel approach that replaces traditional cross-attention with uniform attention maps.<n>Our method effectively minimizes distortions caused by varying text conditions during noise prediction.<n> Experimental results demonstrate that our approach not only excels in achieving high-fidelity image reconstruction but also performs robustly in real image composition and editing scenarios.
arXiv Detail & Related papers (2024-11-29T12:11:28Z) - Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing [43.97960454977206]
In this paper, we analyze the diffusion inversion and invariance control based on the flow transformer.<n>We propose a two-stage inversion to first refine the velocity estimation and then compensate for the leftover error.<n>This mechanism can simultaneously preserve the non-target contents while allowing rigid and non-rigid manipulation.
arXiv Detail & Related papers (2024-11-24T13:48:16Z) - Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing [60.730661748555214]
We introduce textbfTask-textbfOriented textbfDiffusion textbfInversion (textbfTODInv), a novel framework that inverts and edits real images tailored to specific editing tasks.
ToDInv seamlessly integrates inversion and editing through reciprocal optimization, ensuring both high fidelity and precise editability.
arXiv Detail & Related papers (2024-08-23T22:16:34Z) - Tuning-Free Inversion-Enhanced Control for Consistent Image Editing [44.311286151669464]
We present a novel approach called Tuning-free Inversion-enhanced Control (TIC)
TIC correlates features from the inversion process with those from the sampling process to mitigate the inconsistency in DDIM reconstruction.
We also propose a mask-guided attention concatenation strategy that combines contents from both the inversion and the naive DDIM editing processes.
arXiv Detail & Related papers (2023-12-22T11:13:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.