LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content
Creation
- URL: http://arxiv.org/abs/2402.05054v1
- Date: Wed, 7 Feb 2024 17:57:03 GMT
- Title: LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content
Creation
- Authors: Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng,
Ziwei Liu
- Abstract summary: We introduce Large Multi-View Gaussian Model (LGM), a novel framework designed to generate high-resolution 3D models from text prompts or single-view images.
We maintain the fast speed to generate 3D objects within 5 seconds while boosting the training resolution to 512, thereby achieving high-resolution 3D content generation.
- Score: 51.19871052619077
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D content creation has achieved significant progress in terms of both
quality and speed. Although current feed-forward models can produce 3D objects
in seconds, their resolution is constrained by the intensive computation
required during training. In this paper, we introduce Large Multi-View Gaussian
Model (LGM), a novel framework designed to generate high-resolution 3D models
from text prompts or single-view images. Our key insights are two-fold: 1) 3D
Representation: We propose multi-view Gaussian features as an efficient yet
powerful representation, which can then be fused together for differentiable
rendering. 2) 3D Backbone: We present an asymmetric U-Net as a high-throughput
backbone operating on multi-view images, which can be produced from text or
single-view image input by leveraging multi-view diffusion models. Extensive
experiments demonstrate the high fidelity and efficiency of our approach.
Notably, we maintain the fast speed to generate 3D objects within 5 seconds
while boosting the training resolution to 512, thereby achieving
high-resolution 3D content generation.
Related papers
- MVGamba: Unify 3D Content Generation as State Space Sequence Modeling [150.80564081817786]
We introduce MVGamba, a general and lightweight Gaussian reconstruction model featuring a multi-view Gaussian reconstructor.
With off-the-detail multi-view diffusion models integrated, MVGamba unifies 3D generation tasks from a single image, sparse images, or text prompts.
Experiments demonstrate that MVGamba outperforms state-of-the-art baselines in all 3D content generation scenarios with approximately only $0.1times$ of the model size.
arXiv Detail & Related papers (2024-06-10T15:26:48Z) - Bootstrap3D: Improving 3D Content Creation with Synthetic Data [80.92268916571712]
A critical bottleneck is the scarcity of high-quality 3D assets with detailed captions.
We propose Bootstrap3D, a novel framework that automatically generates an arbitrary quantity of multi-view images.
We have generated 1 million high-quality synthetic multi-view images with dense descriptive captions.
arXiv Detail & Related papers (2024-05-31T17:59:56Z) - InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models [66.83681825842135]
InstantMesh is a feed-forward framework for instant 3D mesh generation from a single image.
It features state-of-the-art generation quality and significant training scalability.
We release all the code, weights, and demo of InstantMesh with the intention that it can make substantial contributions to the community of 3D generative AI.
arXiv Detail & Related papers (2024-04-10T17:48:37Z) - Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Orthogonal Diffusion Models [6.738732514502613]
Diffusion$2$ is a novel framework for dynamic 3D content creation.
We design a simple yet effective denoising strategy via score composition of pretrained video and multi-view diffusion models.
Our framework can generate 4D content within few minutes.
arXiv Detail & Related papers (2024-04-02T17:58:03Z) - VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model [34.35449902855767]
Two fundamental questions are what data we use for training and how to ensure multi-view consistency.
We propose a dense consistent multi-view generation model that is fine-tuned from off-the-shelf video generative models.
Our approach can generate 24 dense views and converges much faster in training than state-of-the-art approaches.
arXiv Detail & Related papers (2024-03-18T17:48:15Z) - One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View
Generation and 3D Diffusion [32.29687304798145]
One-2-3-45++ is an innovative method that transforms a single image into a detailed 3D textured mesh in approximately one minute.
Our approach aims to fully harness the extensive knowledge embedded in 2D diffusion models and priors from valuable yet limited 3D data.
arXiv Detail & Related papers (2023-11-14T03:40:25Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z) - Generative Multiplane Neural Radiance for 3D-Aware Image Generation [102.15322193381617]
We present a method to efficiently generate 3D-aware high-resolution images that are view-consistent across multiple target views.
Our GMNR model generates 3D-aware images of 1024 X 1024 pixels with 17.6 FPS on a single V100.
arXiv Detail & Related papers (2023-04-03T17:41:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.