V3D: Video Diffusion Models are Effective 3D Generators
- URL: http://arxiv.org/abs/2403.06738v1
- Date: Mon, 11 Mar 2024 14:03:36 GMT
- Title: V3D: Video Diffusion Models are Effective 3D Generators
- Authors: Zilong Chen, Yikai Wang, Feng Wang, Zhengyi Wang, Huaping Liu
- Abstract summary: We introduce V3D, which leverages the world simulation capacity of pre-trained video diffusion models to facilitate 3D generation.
Benefiting from this, the state-of-the-art video diffusion model could be fine-tuned to generate 360degree orbit frames surrounding an object given a single image.
Our method can be extended to scene-level novel view synthesis, achieving precise control over the camera path with sparse input views.
- Score: 19.33837029942662
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic 3D generation has recently attracted widespread attention. Recent
methods have greatly accelerated the generation speed, but usually produce
less-detailed objects due to limited model capacity or 3D data. Motivated by
recent advancements in video diffusion models, we introduce V3D, which
leverages the world simulation capacity of pre-trained video diffusion models
to facilitate 3D generation. To fully unleash the potential of video diffusion
to perceive the 3D world, we further introduce geometrical consistency prior
and extend the video diffusion model to a multi-view consistent 3D generator.
Benefiting from this, the state-of-the-art video diffusion model could be
fine-tuned to generate 360degree orbit frames surrounding an object given a
single image. With our tailored reconstruction pipelines, we can generate
high-quality meshes or 3D Gaussians within 3 minutes. Furthermore, our method
can be extended to scene-level novel view synthesis, achieving precise control
over the camera path with sparse input views. Extensive experiments demonstrate
the superior performance of the proposed approach, especially in terms of
generation quality and multi-view consistency. Our code is available at
https://github.com/heheyas/V3D
Related papers
- Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation [45.95218923564575]
We propose a novel single-stage 3D diffusion model, DiffusionGS, for object and scene generation from a single view.
Experiments show that our method enjoys better generation quality (2.20 dB higher in PSNR and 23.25 lower in FID) and over 5x faster speed (6s on an A100 GPU) than SOTA methods.
arXiv Detail & Related papers (2024-11-21T18:21:24Z) - Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models [112.2625368640425]
High-resolution Image-to-3D model (Hi3D) is a new video diffusion based paradigm that redefines a single image to multi-view images as 3D-aware sequential image generation.
Hi3D first empowers the pre-trained video diffusion model with 3D-aware prior, yielding multi-view images with low-resolution texture details.
arXiv Detail & Related papers (2024-09-11T17:58:57Z) - SuperGaussian: Repurposing Video Models for 3D Super Resolution [67.19266415499139]
We present a simple, modular, and generic method that upsamples coarse 3D models by adding geometric and appearance details.
We demonstrate that it is possible to directly repurpose existing (pretrained) video models for 3D super-resolution.
arXiv Detail & Related papers (2024-06-02T03:44:50Z) - VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models [20.084928490309313]
This paper presents a novel method for building scalable 3D generative models utilizing pre-trained video diffusion models.
By unlocking its multi-view generative capabilities through fine-tuning, we generate a large-scale synthetic multi-view dataset to train a feed-forward 3D generative model.
The proposed model, VFusion3D, trained on nearly 3M synthetic multi-view data, can generate a 3D asset from a single image in seconds.
arXiv Detail & Related papers (2024-03-18T17:59:12Z) - LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation [73.36690511083894]
This paper introduces a novel framework called LN3Diff to address a unified 3D diffusion pipeline.
Our approach harnesses a 3D-aware architecture and variational autoencoder to encode the input image into a structured, compact, and 3D latent space.
It achieves state-of-the-art performance on ShapeNet for 3D generation and demonstrates superior performance in monocular 3D reconstruction and conditional 3D generation.
arXiv Detail & Related papers (2024-03-18T17:54:34Z) - 3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation [51.64796781728106]
We propose a generative refinement network to synthesize new contents with higher quality by exploiting the natural image prior to 2D diffusion model and the global 3D information of the current scene.
Our approach supports wide variety of scene generation and arbitrary camera trajectories with improved visual quality and 3D consistency.
arXiv Detail & Related papers (2024-03-14T14:31:22Z) - Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior [57.986512832738704]
We present a new framework Sculpt3D that equips the current pipeline with explicit injection of 3D priors from retrieved reference objects without re-training the 2D diffusion model.
Specifically, we demonstrate that high-quality and diverse 3D geometry can be guaranteed by keypoints supervision through a sparse ray sampling approach.
These two decoupled designs effectively harness 3D information from reference objects to generate 3D objects while preserving the generation quality of the 2D diffusion model.
arXiv Detail & Related papers (2024-03-14T07:39:59Z) - One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View
Generation and 3D Diffusion [32.29687304798145]
One-2-3-45++ is an innovative method that transforms a single image into a detailed 3D textured mesh in approximately one minute.
Our approach aims to fully harness the extensive knowledge embedded in 2D diffusion models and priors from valuable yet limited 3D data.
arXiv Detail & Related papers (2023-11-14T03:40:25Z) - GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models [102.22388340738536]
2D and 3D diffusion models can generate decent 3D objects based on prompts.
3D diffusion models have good 3D consistency, but their quality and generalization are limited as trainable 3D data is expensive and hard to obtain.
This paper attempts to bridge the power from the two types of diffusion models via the recent explicit and efficient 3D Gaussian splatting representation.
arXiv Detail & Related papers (2023-10-12T17:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.