MVDream: Multi-view Diffusion for 3D Generation
- URL: http://arxiv.org/abs/2308.16512v4
- Date: Thu, 18 Apr 2024 04:12:32 GMT
- Title: MVDream: Multi-view Diffusion for 3D Generation
- Authors: Yichun Shi, Peng Wang, Jianglong Ye, Mai Long, Kejie Li, Xiao Yang,
- Abstract summary: We introduce MVDream, a diffusion model that is able to generate consistent multi-view images from a given text prompt.
Learning from both 2D and 3D data, a multi-view diffusion model can achieve the generalizability of 2D diffusion models and the consistency of 3D renderings.
- Score: 14.106283556521962
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce MVDream, a diffusion model that is able to generate consistent multi-view images from a given text prompt. Learning from both 2D and 3D data, a multi-view diffusion model can achieve the generalizability of 2D diffusion models and the consistency of 3D renderings. We demonstrate that such a multi-view diffusion model is implicitly a generalizable 3D prior agnostic to 3D representations. It can be applied to 3D generation via Score Distillation Sampling, significantly enhancing the consistency and stability of existing 2D-lifting methods. It can also learn new concepts from a few 2D examples, akin to DreamBooth, but for 3D generation.
Related papers
- Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation [66.75243908044538]
We introduce Zero-1-to-G, a novel approach to direct 3D generation on Gaussian splats using pretrained 2D diffusion models.
To incorporate 3D awareness, we introduce cross-view and cross-attribute attention layers, which capture complex correlations and enforce 3D consistency across generated splats.
This makes Zero-1-to-G the first direct image-to-3D generative model to effectively utilize pretrained 2D diffusion priors, enabling efficient training and improved generalization to unseen objects.
arXiv Detail & Related papers (2025-01-09T18:37:35Z) - Gen-3Diffusion: Realistic Image-to-3D Generation via 2D & 3D Diffusion Synergy [29.73743772971411]
We propose Gen-3Diffusion: Realistic Image-to-3D Generation via 2D & 3D Diffusion Synergy.
We leverage a pre-trained 2D diffusion model and a 3D diffusion model via our elegantly designed process.
Our method generates realistic 3D objects and avatars with high-fidelity geometry and texture.
arXiv Detail & Related papers (2024-12-09T17:44:42Z) - Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models [112.2625368640425]
High-resolution Image-to-3D model (Hi3D) is a new video diffusion based paradigm that redefines a single image to multi-view images as 3D-aware sequential image generation.
Hi3D first empowers the pre-trained video diffusion model with 3D-aware prior, yielding multi-view images with low-resolution texture details.
arXiv Detail & Related papers (2024-09-11T17:58:57Z) - Human-3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models [29.73743772971411]
We propose Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion.
Our key insight is that 2D multi-view diffusion and 3D reconstruction models provide complementary information for each other.
Our proposed framework outperforms state-of-the-art methods and enables the creation of realistic avatars from a single RGB image.
arXiv Detail & Related papers (2024-06-12T17:57:25Z) - MVDiff: Scalable and Flexible Multi-View Diffusion for 3D Object Reconstruction from Single-View [0.0]
This paper proposes a general framework to generate consistent multi-view images from single image or leveraging scene representation transformer and view-conditioned diffusion model.
Our model is able to generate 3D meshes surpassing baselines methods in evaluation metrics, including PSNR, SSIM and LPIPS.
arXiv Detail & Related papers (2024-05-06T22:55:53Z) - Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior [57.986512832738704]
We present a new framework Sculpt3D that equips the current pipeline with explicit injection of 3D priors from retrieved reference objects without re-training the 2D diffusion model.
Specifically, we demonstrate that high-quality and diverse 3D geometry can be guaranteed by keypoints supervision through a sparse ray sampling approach.
These two decoupled designs effectively harness 3D information from reference objects to generate 3D objects while preserving the generation quality of the 2D diffusion model.
arXiv Detail & Related papers (2024-03-14T07:39:59Z) - Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D
Prior [52.44678180286886]
2D diffusion models find a distillation approach that achieves excellent generalization and rich details without any 3D data.
We propose Sherpa3D, a new text-to-3D framework that achieves high-fidelity, generalizability, and geometric consistency simultaneously.
arXiv Detail & Related papers (2023-12-11T18:59:18Z) - Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion [32.13452288549591]
Current methods for creating 3D content are time-consuming and lose generation diversity.
In this work, we employ a multi-view 2.5D diffusion fine-tuned from a pre-trained 2D diffusion model.
We show that our direct 2.5D generation with the specially-designed fusion scheme can achieve diverse, mode-seeking-free, and high-fidelity 3D content generation in only 10 seconds.
arXiv Detail & Related papers (2023-11-27T16:26:54Z) - GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models [102.22388340738536]
2D and 3D diffusion models can generate decent 3D objects based on prompts.
3D diffusion models have good 3D consistency, but their quality and generalization are limited as trainable 3D data is expensive and hard to obtain.
This paper attempts to bridge the power from the two types of diffusion models via the recent explicit and efficient 3D Gaussian splatting representation.
arXiv Detail & Related papers (2023-10-12T17:22:24Z) - HoloDiffusion: Training a 3D Diffusion Model using 2D Images [71.1144397510333]
We introduce a new diffusion setup that can be trained, end-to-end, with only posed 2D images for supervision.
We show that our diffusion models are scalable, train robustly, and are competitive in terms of sample quality and fidelity to existing approaches for 3D generative modeling.
arXiv Detail & Related papers (2023-03-29T07:35:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.