FlowDrag: 3D-aware Drag-based Image Editing with Mesh-guided Deformation Vector Flow Fields
- URL: http://arxiv.org/abs/2507.08285v1
- Date: Fri, 11 Jul 2025 03:18:52 GMT
- Title: FlowDrag: 3D-aware Drag-based Image Editing with Mesh-guided Deformation Vector Flow Fields
- Authors: Gwanhyeong Koo, Sunjae Yoon, Younghwan Lee, Ji Woo Hong, Chang D. Yoo,
- Abstract summary: We propose FlowDrag, which leverages geometric information for more accurate and coherent transformations.<n>Our approach constructs a 3D mesh from the image, using an energy function to guide mesh deformation based on user-defined drag points.<n>The resulting mesh displacements are projected into 2D and incorporated into a UNet denoising process, enabling precise handle-to-target point alignment.
- Score: 20.793887576117527
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Drag-based editing allows precise object manipulation through point-based control, offering user convenience. However, current methods often suffer from a geometric inconsistency problem by focusing exclusively on matching user-defined points, neglecting the broader geometry and leading to artifacts or unstable edits. We propose FlowDrag, which leverages geometric information for more accurate and coherent transformations. Our approach constructs a 3D mesh from the image, using an energy function to guide mesh deformation based on user-defined drag points. The resulting mesh displacements are projected into 2D and incorporated into a UNet denoising process, enabling precise handle-to-target point alignment while preserving structural integrity. Additionally, existing drag-editing benchmarks provide no ground truth, making it difficult to assess how accurately the edits match the intended transformations. To address this, we present VFD (VidFrameDrag) benchmark dataset, which provides ground-truth frames using consecutive shots in a video dataset. FlowDrag outperforms existing drag-based editing methods on both VFD Bench and DragBench.
Related papers
- Training-free Geometric Image Editing on Diffusion Models [53.38549950608886]
We tackle the task of geometric image editing, where an object within an image is repositioned, reoriented, or reshaped.<n>We propose a decoupled pipeline that separates object transformation, source region inpainting, and target region refinement.<n>Both inpainting and refinement are implemented using a training-free diffusion approach, FreeFine.
arXiv Detail & Related papers (2025-07-31T07:36:00Z) - Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy [36.08715662927022]
We present Shape-for-Motion, a novel framework that incorporates a 3D proxy for precise and consistent video editing.<n>Our framework supports various precise and physically-consistent manipulations across the video frames, including pose editing, rotation, scaling, translation, texture modification, and object composition.
arXiv Detail & Related papers (2025-06-27T17:59:01Z) - SphereDrag: Spherical Geometry-Aware Panoramic Image Editing [50.0866506514989]
We propose SphereDrag, a novel panoramic editing framework utilizing spherical geometry knowledge for accurate and controllable editing.<n>Specifically, adaptive reprojection (AR) uses adaptive spherical rotation to deal with discontinuity; great-circle trajectory adjustment (GCTA) tracks the movement trajectory more accurate.<n>Also, we construct PanoBench, a panoramic editing benchmark, including complex editing tasks involving multiple objects and diverse styles, which provides a standardized evaluation framework.
arXiv Detail & Related papers (2025-06-13T15:13:09Z) - ARAP-GS: Drag-driven As-Rigid-As-Possible 3D Gaussian Splatting Editing with Diffusion Prior [7.218737495375119]
ARAP-GS is a drag-driven 3DGS editing framework based on As-id-As-Rig (ARAP) deformation.<n>We are the first to apply ARAP deformation directly to 3D Gaussians, enabling flexible, drag-driven geometric transformations.<n>Our method is highly efficient, requiring only 10 to 20 minutes to edit a scene on a single 3090 GPU.
arXiv Detail & Related papers (2025-04-17T09:37:11Z) - MeshPad: Interactive Sketch-Conditioned Artist-Designed Mesh Generation and Editing [64.84885028248395]
MeshPad is a generative approach that creates 3D meshes from sketch inputs.<n>We focus on enabling consistent edits by decomposing editing into 'deletion' of regions of a mesh, followed by 'addition' of new mesh geometry.<n>Our approach is based on a triangle sequence-based mesh representation, exploiting a large Transformer model for mesh triangle addition and deletion.
arXiv Detail & Related papers (2025-03-03T11:27:44Z) - Drag Your Gaussian: Effective Drag-Based Editing with Score Distillation for 3D Gaussian Splatting [55.14822004410817]
We introduce DYG, an effective 3D drag-based editing method for 3D Gaussian Splatting.<n>It enables precise control over the extent of editing through the input of 3D masks and pairs of control points.<n>DYG integrates the strengths of the implicit triplane representation to establish the geometric scaffold of the editing results.
arXiv Detail & Related papers (2025-01-30T18:51:54Z) - DragScene: Interactive 3D Scene Editing with Single-view Drag Instructions [9.312577767600139]
3D editing has shown remarkable capability in editing scenes based on various instructions.<n>Existing methods struggle with achieving intuitive, localized editing.<n>We introduce DragScene, a framework that integrates drag-style editing with diverse 3D representations.
arXiv Detail & Related papers (2024-12-18T07:02:01Z) - DragGaussian: Enabling Drag-style Manipulation on 3D Gaussian Representation [57.406031264184584]
DragGaussian is a 3D object drag-editing framework based on 3D Gaussian Splatting.
Our contributions include the introduction of a new task, the development of DragGaussian for interactive point-based 3D editing, and comprehensive validation of its effectiveness through qualitative and quantitative experiments.
arXiv Detail & Related papers (2024-05-09T14:34:05Z) - View-Consistent 3D Editing with Gaussian Splatting [50.6460814430094]
View-consistent Editing (VcEdit) is a novel framework that seamlessly incorporates 3DGS into image editing processes.<n>By incorporating consistency modules into an iterative pattern, VcEdit proficiently resolves the issue of multi-view inconsistency.
arXiv Detail & Related papers (2024-03-18T15:22:09Z) - DragD3D: Realistic Mesh Editing with Rigidity Control Driven by 2D Diffusion Priors [10.355568895429588]
Direct mesh editing and deformation are key components in the geometric modeling and animation pipeline.
Regularizers are not aware of the global context and semantics of the object.
We show that our deformations can be controlled to yield realistic shape deformations aware of the global context.
arXiv Detail & Related papers (2023-10-06T19:55:40Z) - FreeDrag: Feature Dragging for Reliable Point-based Image Editing [16.833998026980087]
We propose FreeDrag, a feature dragging methodology designed to free the burden on point tracking.
The FreeDrag incorporates two key designs, i.e., template feature via adaptive updating and line search with backtracking.
Our approach significantly outperforms pre-existing methodologies, offering reliable point-based editing even in various complex scenarios.
arXiv Detail & Related papers (2023-07-10T16:37:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.