DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction
Model
- URL: http://arxiv.org/abs/2311.09217v1
- Date: Wed, 15 Nov 2023 18:58:41 GMT
- Title: DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction
Model
- Authors: Yinghao Xu, Hao Tan, Fujun Luan, Sai Bi, Peng Wang, Jiahao Li, Zifan
Shi, Kalyan Sunkavalli, Gordon Wetzstein, Zexiang Xu, Kai Zhang
- Abstract summary: textbfDMV3D is a novel 3D generation approach that uses a transformer-based 3D large reconstruction model to denoise multi-view diffusion.
Our reconstruction model incorporates a triplane NeRF representation and can denoise noisy multi-view images via NeRF reconstruction and rendering.
- Score: 86.37536249046943
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose \textbf{DMV3D}, a novel 3D generation approach that uses a
transformer-based 3D large reconstruction model to denoise multi-view
diffusion. Our reconstruction model incorporates a triplane NeRF representation
and can denoise noisy multi-view images via NeRF reconstruction and rendering,
achieving single-stage 3D generation in $\sim$30s on single A100 GPU. We train
\textbf{DMV3D} on large-scale multi-view image datasets of highly diverse
objects using only image reconstruction losses, without accessing 3D assets. We
demonstrate state-of-the-art results for the single-image reconstruction
problem where probabilistic modeling of unseen object parts is required for
generating diverse reconstructions with sharp textures. We also show
high-quality text-to-3D generation results outperforming previous 3D diffusion
models. Our project website is at: https://justimyhxu.github.io/projects/dmv3d/ .
Related papers
- GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction [52.04103235260539]
We present a diffusion model approach based on Gaussian Splatting representation for 3D object reconstruction from a single view.
The model learns to generate 3D objects represented by sets of GS ellipsoids.
The final reconstructed objects explicitly come with high-quality 3D structure and texture, and can be efficiently rendered in arbitrary views.
arXiv Detail & Related papers (2024-07-05T03:43:08Z) - MVGamba: Unify 3D Content Generation as State Space Sequence Modeling [150.80564081817786]
We introduce MVGamba, a general and lightweight Gaussian reconstruction model featuring a multi-view Gaussian reconstructor.
With off-the-detail multi-view diffusion models integrated, MVGamba unifies 3D generation tasks from a single image, sparse images, or text prompts.
Experiments demonstrate that MVGamba outperforms state-of-the-art baselines in all 3D content generation scenarios with approximately only $0.1times$ of the model size.
arXiv Detail & Related papers (2024-06-10T15:26:48Z) - MVDiff: Scalable and Flexible Multi-View Diffusion for 3D Object Reconstruction from Single-View [0.0]
This paper proposes a general framework to generate consistent multi-view images from single image or leveraging scene representation transformer and view-conditioned diffusion model.
Our model is able to generate 3D meshes surpassing baselines methods in evaluation metrics, including PSNR, SSIM and LPIPS.
arXiv Detail & Related papers (2024-05-06T22:55:53Z) - IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality
3D Generation [96.32684334038278]
In this paper, we explore the design space of text-to-3D models.
We significantly improve multi-view generation by considering video instead of image generators.
Our new method, IM-3D, reduces the number of evaluations of the 2D generator network 10-100x.
arXiv Detail & Related papers (2024-02-13T18:59:51Z) - Denoising Diffusion via Image-Based Rendering [54.20828696348574]
We introduce the first diffusion model able to perform fast, detailed reconstruction and generation of real-world 3D scenes.
First, we introduce a new neural scene representation, IB-planes, that can efficiently and accurately represent large 3D scenes.
Second, we propose a denoising-diffusion framework to learn a prior over this novel 3D scene representation, using only 2D images.
arXiv Detail & Related papers (2024-02-05T19:00:45Z) - Anything-3D: Towards Single-view Anything Reconstruction in the Wild [61.090129285205805]
We introduce Anything-3D, a methodical framework that ingeniously combines a series of visual-language models and the Segment-Anything object segmentation model.
Our approach employs a BLIP model to generate textural descriptions, utilize the Segment-Anything model for the effective extraction of objects of interest, and leverages a text-to-image diffusion model to lift object into a neural radiance field.
arXiv Detail & Related papers (2023-04-19T16:39:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.