Dragging with Geometry: From Pixels to Geometry-Guided Image Editing
- URL: http://arxiv.org/abs/2509.25740v1
- Date: Tue, 30 Sep 2025 03:53:11 GMT
- Title: Dragging with Geometry: From Pixels to Geometry-Guided Image Editing
- Authors: Xinyu Pu, Hongsong Wang, Jie Gui, Pan Zhou,
- Abstract summary: We propose a novel geometry-guided drag-based image editing method - GeoDrag.<n>Built upon a unified displacement field that jointly encodes 3D geometry and 2D spatial priors, GeoDrag enables coherent, high-fidelity, and structure-consistent editing.
- Score: 42.176957681367185
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Interactive point-based image editing serves as a controllable editor, enabling precise and flexible manipulation of image content. However, most drag-based methods operate primarily on the 2D pixel plane with limited use of 3D cues. As a result, they often produce imprecise and inconsistent edits, particularly in geometry-intensive scenarios such as rotations and perspective transformations. To address these limitations, we propose a novel geometry-guided drag-based image editing method - GeoDrag, which addresses three key challenges: 1) incorporating 3D geometric cues into pixel-level editing, 2) mitigating discontinuities caused by geometry-only guidance, and 3) resolving conflicts arising from multi-point dragging. Built upon a unified displacement field that jointly encodes 3D geometry and 2D spatial priors, GeoDrag enables coherent, high-fidelity, and structure-consistent editing in a single forward pass. In addition, a conflict-free partitioning strategy is introduced to isolate editing regions, effectively preventing interference and ensuring consistency. Extensive experiments across various editing scenarios validate the effectiveness of our method, showing superior precision, structural consistency, and reliable multi-point editability. The code will be available on https://github.com/xinyu-pu/GeoDrag .
Related papers
- World-Shaper: A Unified Framework for 360° Panoramic Editing [57.174341220144605]
Existing perspective-based image editing methods fail to model the spatial structure of panoramas.<n>We present World-Shaper, a unified geometry-aware framework that bridges panoramic generation and editing within a single editing-centric design.<n>Our method achieves superior geometric consistency, editing fidelity, and text controllability compared to SOTA methods.
arXiv Detail & Related papers (2026-01-30T19:38:54Z) - POCI-Diff: Position Objects Consistently and Interactively with 3D-Layout Guided Diffusion [46.97254555348757]
We propose a diffusion-based approach for Text-to-Image (T2I) generation with consistent and interactive 3D layout control and editing.<n>We introduce a framework for Positioning Objects Consistently and Interactively (POCI-Diff)<n>Our method enables explicit per-object semantic control by binding individual text descriptions to specific 3D bounding boxes.
arXiv Detail & Related papers (2026-01-20T15:13:43Z) - 3DGS-Drag: Dragging Gaussians for Intuitive Point-Based 3D Editing [58.54083747494426]
3DGS-Drag is a point-based 3D editing framework that provides efficient, intuitive drag manipulation of real 3D scenes.<n>Our approach bridges the gap between deformation-based and 2D-editing-based 3D editing methods.
arXiv Detail & Related papers (2026-01-12T19:57:31Z) - FlowDrag: 3D-aware Drag-based Image Editing with Mesh-guided Deformation Vector Flow Fields [20.793887576117527]
We propose FlowDrag, which leverages geometric information for more accurate and coherent transformations.<n>Our approach constructs a 3D mesh from the image, using an energy function to guide mesh deformation based on user-defined drag points.<n>The resulting mesh displacements are projected into 2D and incorporated into a UNet denoising process, enabling precise handle-to-target point alignment.
arXiv Detail & Related papers (2025-07-11T03:18:52Z) - SphereDrag: Spherical Geometry-Aware Panoramic Image Editing [53.87789202723925]
We propose SphereDrag, a novel panoramic editing framework utilizing spherical geometry knowledge for accurate and controllable editing.<n>Specifically, adaptive reprojection (AR) uses adaptive spherical rotation to deal with discontinuity; great-circle trajectory adjustment (GCTA) tracks the movement trajectory more accurate.<n>Also, we construct PanoBench, a panoramic editing benchmark, including complex editing tasks involving multiple objects and diverse styles, which provides a standardized evaluation framework.
arXiv Detail & Related papers (2025-06-13T15:13:09Z) - Advancing 3D Gaussian Splatting Editing with Complementary and Consensus Information [4.956066467858058]
We present a novel framework for enhancing the visual fidelity and consistency of text-guided 3D Gaussian Splatting (3DGS) editing.<n>Our method demonstrates superior performance in rendering quality and view consistency compared to state-of-the-art approaches.
arXiv Detail & Related papers (2025-03-14T17:15:26Z) - Drag Your Gaussian: Effective Drag-Based Editing with Score Distillation for 3D Gaussian Splatting [55.14822004410817]
We introduce DYG, an effective 3D drag-based editing method for 3D Gaussian Splatting.<n>It enables precise control over the extent of editing through the input of 3D masks and pairs of control points.<n>DYG integrates the strengths of the implicit triplane representation to establish the geometric scaffold of the editing results.
arXiv Detail & Related papers (2025-01-30T18:51:54Z) - PrEditor3D: Fast and Precise 3D Shape Editing [100.09112677669376]
We propose a training-free approach to 3D editing that enables the editing of a single shape within a few minutes.<n>The edited 3D mesh aligns well with the prompts, and remains identical for regions that are not intended to be altered.
arXiv Detail & Related papers (2024-12-09T15:44:47Z) - Plasticine3D: 3D Non-Rigid Editing with Text Guidance by Multi-View Embedding Optimization [21.8454418337306]
We propose Plasticine3D, a novel text-guided controlled 3D editing pipeline that can perform 3D non-rigid editing.
Our work divides the editing process into a geometry editing stage and a texture editing stage to achieve separate control of structure and appearance.
For the purpose of fine-grained control, we propose Embedding-Fusion (EF) to blend the original characteristics with the editing objectives in the embedding space.
arXiv Detail & Related papers (2023-12-15T09:01:54Z) - SINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing
Field [37.8162035179377]
We present a novel semantic-driven NeRF editing approach, which enables users to edit a neural radiance field with a single image.
To achieve this goal, we propose a prior-guided editing field to encode fine-grained geometric and texture editing in 3D space.
Our method achieves photo-realistic 3D editing using only a single edited image, pushing the bound of semantic-driven editing in 3D real-world scenes.
arXiv Detail & Related papers (2023-03-23T13:58:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.