MvDrag3D: Drag-based Creative 3D Editing via Multi-view Generation-Reconstruction Priors
- URL: http://arxiv.org/abs/2410.16272v1
- Date: Mon, 21 Oct 2024 17:59:53 GMT
- Title: MvDrag3D: Drag-based Creative 3D Editing via Multi-view Generation-Reconstruction Priors
- Authors: Honghua Chen, Yushi Lan, Yongwei Chen, Yifan Zhou, Xingang Pan,
- Abstract summary: Existing 3D drag-based editing methods fall short in handling significant topology changes or generating new textures across diverse object categories.
We introduce MVDrag3D, a novel framework for more flexible and creative drag-based 3D editing.
We show that MVDrag3D provides a precise, generative, and flexible solution for 3D drag-based editing.
- Score: 19.950368071777092
- License:
- Abstract: Drag-based editing has become popular in 2D content creation, driven by the capabilities of image generative models. However, extending this technique to 3D remains a challenge. Existing 3D drag-based editing methods, whether employing explicit spatial transformations or relying on implicit latent optimization within limited-capacity 3D generative models, fall short in handling significant topology changes or generating new textures across diverse object categories. To overcome these limitations, we introduce MVDrag3D, a novel framework for more flexible and creative drag-based 3D editing that leverages multi-view generation and reconstruction priors. At the core of our approach is the usage of a multi-view diffusion model as a strong generative prior to perform consistent drag editing over multiple rendered views, which is followed by a reconstruction model that reconstructs 3D Gaussians of the edited object. While the initial 3D Gaussians may suffer from misalignment between different views, we address this via view-specific deformation networks that adjust the position of Gaussians to be well aligned. In addition, we propose a multi-view score function that distills generative priors from multiple views to further enhance the view consistency and visual quality. Extensive experiments demonstrate that MVDrag3D provides a precise, generative, and flexible solution for 3D drag-based editing, supporting more versatile editing effects across various object categories and 3D representations.
Related papers
- Manipulating Vehicle 3D Shapes through Latent Space Editing [0.0]
This paper introduces a framework that employs a pre-trained regressor, enabling continuous, precise, attribute-specific modifications to vehicle 3D models.
Our method not only preserves the inherent identity of vehicle 3D objects, but also supports multi-attribute editing, allowing for extensive customization without compromising the model's structural integrity.
arXiv Detail & Related papers (2024-10-31T13:41:16Z) - Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation [61.040832373015014]
We propose Flex3D, a novel framework for generating high-quality 3D content from text, single images, or sparse view images.
We employ a fine-tuned multi-view image diffusion model and a video diffusion model to generate a pool of candidate views, enabling a rich representation of the target 3D object.
In the second stage, the curated views are fed into a Flexible Reconstruction Model (FlexRM), built upon a transformer architecture that can effectively process an arbitrary number of inputs.
arXiv Detail & Related papers (2024-10-01T17:29:43Z) - Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images [72.70883914827687]
Tailor3D is a novel pipeline that creates customized 3D assets from editable dual-side images.
It provides a user-friendly, efficient solution for editing 3D assets, with each editing step taking only seconds to complete.
arXiv Detail & Related papers (2024-07-08T17:59:55Z) - DiffTF++: 3D-aware Diffusion Transformer for Large-Vocabulary 3D Generation [53.20147419879056]
We introduce a diffusion-based feed-forward framework to address challenges with a single model.
Building upon our 3D-aware Diffusion model with TransFormer, we propose a stronger version for 3D generation, i.e., DiffTF++.
Experiments on ShapeNet and OmniObject3D convincingly demonstrate the effectiveness of our proposed modules.
arXiv Detail & Related papers (2024-05-13T17:59:51Z) - DragGaussian: Enabling Drag-style Manipulation on 3D Gaussian Representation [57.406031264184584]
DragGaussian is a 3D object drag-editing framework based on 3D Gaussian Splatting.
Our contributions include the introduction of a new task, the development of DragGaussian for interactive point-based 3D editing, and comprehensive validation of its effectiveness through qualitative and quantitative experiments.
arXiv Detail & Related papers (2024-05-09T14:34:05Z) - Generic 3D Diffusion Adapter Using Controlled Multi-View Editing [44.99706994361726]
Open-domain 3D object synthesis has been lagging behind image synthesis due to limited data and higher computational complexity.
This paper proposes MVEdit, which functions as a 3D counterpart of SDEdit, employing ancestral sampling to jointly denoise multi-view images.
MVEdit achieves 3D consistency through a training-free 3D Adapter, which lifts the 2D views of the last timestep into a coherent 3D representation.
arXiv Detail & Related papers (2024-03-18T17:59:09Z) - View-Consistent 3D Editing with Gaussian Splatting [50.6460814430094]
View-consistent Editing (VcEdit) is a novel framework that seamlessly incorporates 3DGS into image editing processes.
By incorporating consistency modules into an iterative pattern, VcEdit proficiently resolves the issue of multi-view inconsistency.
arXiv Detail & Related papers (2024-03-18T15:22:09Z) - Multi-view Inversion for 3D-aware Generative Adversarial Networks [3.95944314850151]
Current 3D GAN inversion methods for human heads typically use only one single frontal image to reconstruct the whole 3D head model.
This leaves out meaningful information when multi-view data or dynamic videos are available.
Our method builds on existing state-of-the-art 3D GAN inversion techniques to allow for consistent and simultaneous inversion of multiple views of the same subject.
arXiv Detail & Related papers (2023-12-08T19:28:40Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.