Related papers: Geometric Image Editing via Effects-Sensitive In-Context Inpainting with Diffusion Transformers

Geometric Image Editing via Effects-Sensitive In-Context Inpainting with Diffusion Transformers

URL: http://arxiv.org/abs/2602.08388v1
Date: Mon, 09 Feb 2026 08:39:47 GMT
Title: Geometric Image Editing via Effects-Sensitive In-Context Inpainting with Diffusion Transformers
Authors: Shuo Zhang, Wenzhuo Wu, Huayu Zhang, Jiarong Cheng, Xianghao Zang, Chao Ban, Hao Sun, Zhongjiang He, Tianwei Cao, Kongming Liang, Zhanyu Ma,
Abstract summary: GeoEdit is a framework that integrates geometric transformations for precise object edits.<n>Effects-context Attention enhances the modeling of intricate lighting and shadow effects for improved realism.<n> RS-Objects is a large-scale geometric editing dataset containing over 120,000 high-quality image pairs.
Score: 41.08668138583002
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in diffusion models have significantly improved image editing. However, challenges persist in handling geometric transformations, such as translation, rotation, and scaling, particularly in complex scenes. Existing approaches suffer from two main limitations: (1) difficulty in achieving accurate geometric editing of object translation, rotation, and scaling; (2) inadequate modeling of intricate lighting and shadow effects, leading to unrealistic results. To address these issues, we propose GeoEdit, a framework that leverages in-context generation through a diffusion transformer module, which integrates geometric transformations for precise object edits. Moreover, we introduce Effects-Sensitive Attention, which enhances the modeling of intricate lighting and shadow effects for improved realism. To further support training, we construct RS-Objects, a large-scale geometric editing dataset containing over 120,000 high-quality image pairs, enabling the model to learn precise geometric editing while generating realistic lighting and shadows. Extensive experiments on public benchmarks demonstrate that GeoEdit consistently outperforms state-of-the-art methods in terms of visual quality, geometric accuracy, and realism.

Related papers

World-Shaper: A Unified Framework for 360° Panoramic Editing [57.174341220144605]
Existing perspective-based image editing methods fail to model the spatial structure of panoramas.<n>We present World-Shaper, a unified geometry-aware framework that bridges panoramic generation and editing within a single editing-centric design.<n>Our method achieves superior geometric consistency, editing fidelity, and text controllability compared to SOTA methods.
arXiv Detail & Related papers (2026-01-30T19:38:54Z)
Dragging with Geometry: From Pixels to Geometry-Guided Image Editing [42.176957681367185]
We propose a novel geometry-guided drag-based image editing method - GeoDrag.<n>Built upon a unified displacement field that jointly encodes 3D geometry and 2D spatial priors, GeoDrag enables coherent, high-fidelity, and structure-consistent editing.
arXiv Detail & Related papers (2025-09-30T03:53:11Z)
Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control [52.87568958372421]
Follow-Your-Shape is a training-free and mask-free framework that supports precise and controllable editing of object shapes.<n>We compute a Trajectory Divergence Map (TDM) by comparing token-wise velocity differences between the inversion and denoising paths.<n>Our method achieves superior editability and visual fidelity, particularly in tasks requiring large-scale shape replacement.
arXiv Detail & Related papers (2025-08-11T16:10:00Z)
Training-free Geometric Image Editing on Diffusion Models [53.38549950608886]
We tackle the task of geometric image editing, where an object within an image is repositioned, reoriented, or reshaped.<n>We propose a decoupled pipeline that separates object transformation, source region inpainting, and target region refinement.<n>Both inpainting and refinement are implemented using a training-free diffusion approach, FreeFine.
arXiv Detail & Related papers (2025-07-31T07:36:00Z)
SphereDrag: Spherical Geometry-Aware Panoramic Image Editing [53.87789202723925]
We propose SphereDrag, a novel panoramic editing framework utilizing spherical geometry knowledge for accurate and controllable editing.<n>Specifically, adaptive reprojection (AR) uses adaptive spherical rotation to deal with discontinuity; great-circle trajectory adjustment (GCTA) tracks the movement trajectory more accurate.<n>Also, we construct PanoBench, a panoramic editing benchmark, including complex editing tasks involving multiple objects and diverse styles, which provides a standardized evaluation framework.
arXiv Detail & Related papers (2025-06-13T15:13:09Z)
Advancing 3D Gaussian Splatting Editing with Complementary and Consensus Information [4.956066467858058]
We present a novel framework for enhancing the visual fidelity and consistency of text-guided 3D Gaussian Splatting (3DGS) editing.<n>Our method demonstrates superior performance in rendering quality and view consistency compared to state-of-the-art approaches.
arXiv Detail & Related papers (2025-03-14T17:15:26Z)
VASE: Object-Centric Appearance and Shape Manipulation of Real Videos [108.60416277357712]
In this work, we introduce a framework that is object-centric and is designed to control both the object's appearance and, notably, to execute precise and explicit structural modifications on the object. We build our framework on a pre-trained image-conditioned diffusion model, integrate layers to handle the temporal dimension, and propose training strategies and architectural modifications to enable shape control. We evaluate our method on the image-driven video editing task showing similar performance to the state-of-the-art, and showcasing novel shape-editing capabilities.
arXiv Detail & Related papers (2024-01-04T18:59:24Z)
Plasticine3D: 3D Non-Rigid Editing with Text Guidance by Multi-View Embedding Optimization [21.8454418337306]
We propose Plasticine3D, a novel text-guided controlled 3D editing pipeline that can perform 3D non-rigid editing. Our work divides the editing process into a geometry editing stage and a texture editing stage to achieve separate control of structure and appearance. For the purpose of fine-grained control, we propose Embedding-Fusion (EF) to blend the original characteristics with the editing objectives in the embedding space.
arXiv Detail & Related papers (2023-12-15T09:01:54Z)
ObjectStitch: Generative Object Compositing [43.206123360578665]
We propose a self-supervised framework for object compositing using conditional diffusion models. Our framework can transform the viewpoint, geometry, color and shadow of the generated object while requiring no manual labeling. Our method outperforms relevant baselines in both realism and faithfulness of the synthesized result images in a user study on various real-world images.
arXiv Detail & Related papers (2022-12-02T02:15:13Z)
Neural Parameterization for Dynamic Human Head Editing [26.071370285285465]
We present Neuralization (NeP), a hybrid representation that provides the advantages of both implicit and explicit methods. NeP is capable of photo-realistic rendering while allowing fine-grained editing of the scene geometry and appearance. The results show that the NeP achieves almost the same level of rendering accuracy while maintaining high editability.
arXiv Detail & Related papers (2022-07-01T05:25:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.