GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting
- URL: http://arxiv.org/abs/2404.19702v1
- Date: Tue, 30 Apr 2024 16:47:46 GMT
- Title: GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting
- Authors: Kai Zhang, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, Zexiang Xu,
- Abstract summary: We propose GS-LRM, a scalable large reconstruction model that can predict high-quality 3D Gaussians from 2-4 posed sparse images in 0.23 seconds on single A100 GPU.
Our model features a very simple transformer-based architecture; we patchify input posed images, pass the primitive multi-view image tokens through a sequence of transformer blocks, and decode final per-pixel Gaussian parameters directly from these tokens for differentiable rendering.
- Score: 49.32327147931905
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose GS-LRM, a scalable large reconstruction model that can predict high-quality 3D Gaussian primitives from 2-4 posed sparse images in 0.23 seconds on single A100 GPU. Our model features a very simple transformer-based architecture; we patchify input posed images, pass the concatenated multi-view image tokens through a sequence of transformer blocks, and decode final per-pixel Gaussian parameters directly from these tokens for differentiable rendering. In contrast to previous LRMs that can only reconstruct objects, by predicting per-pixel Gaussians, GS-LRM naturally handles scenes with large variations in scale and complexity. We show that our model can work on both object and scene captures by training it on Objaverse and RealEstate10K respectively. In both scenarios, the models outperform state-of-the-art baselines by a wide margin. We also demonstrate applications of our model in downstream 3D generation tasks. Our project webpage is available at: https://sai-bi.github.io/project/gs-lrm/ .
Related papers
- SCube: Instant Large-Scale Scene Reconstruction using VoxSplats [55.383993296042526]
We present SCube, a novel method for reconstructing large-scale 3D scenes (geometry, appearance, and semantics) from a sparse set of posed images.
Our method encodes reconstructed scenes using a novel representation VoxSplat, which is a set of 3D Gaussians supported on a high-resolution sparse-voxel scaffold.
arXiv Detail & Related papers (2024-10-26T00:52:46Z) - Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats [31.37432523412404]
Long-LRM can process 32 source images at 960x540 resolution within only 1.3 seconds on a single A100 80G GPU.
Unlike previous feed-forward models that are limited to processing 14 input images, Long-LRM reconstructs the entire scene in a single feed-forward step.
arXiv Detail & Related papers (2024-10-16T17:54:06Z) - MVGamba: Unify 3D Content Generation as State Space Sequence Modeling [150.80564081817786]
We introduce MVGamba, a general and lightweight Gaussian reconstruction model featuring a multi-view Gaussian reconstructor.
With off-the-detail multi-view diffusion models integrated, MVGamba unifies 3D generation tasks from a single image, sparse images, or text prompts.
Experiments demonstrate that MVGamba outperforms state-of-the-art baselines in all 3D content generation scenarios with approximately only $0.1times$ of the model size.
arXiv Detail & Related papers (2024-06-10T15:26:48Z) - GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation [85.15374487533643]
We introduce GRM, a large-scale reconstructor capable of recovering a 3D asset from sparse-view images in around 0.1s.
GRM is a feed-forward transformer-based model that efficiently incorporates multi-view information.
We also showcase the potential of GRM in generative tasks, i.e., text-to-3D and image-to-3D, by integrating it with existing multi-view diffusion models.
arXiv Detail & Related papers (2024-03-21T17:59:34Z) - PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape
Prediction [77.89935657608926]
We propose a Pose-Free Large Reconstruction Model (PF-LRM) for reconstructing a 3D object from a few unposed images.
PF-LRM simultaneously estimates the relative camera poses in 1.3 seconds on a single A100 GPU.
arXiv Detail & Related papers (2023-11-20T18:57:55Z) - LRM: Large Reconstruction Model for Single Image to 3D [61.47357798633123]
We propose the first Large Reconstruction Model (LRM) that predicts the 3D model of an object from a single input image within just 5 seconds.
LRM adopts a highly scalable transformer-based architecture with 500 million learnable parameters to directly predict a neural radiance field (NeRF) from the input image.
We train our model in an end-to-end manner on massive multi-view data containing around 1 million objects.
arXiv Detail & Related papers (2023-11-08T00:03:52Z) - Bayesian Image Reconstruction using Deep Generative Models [7.012708932320081]
In this work, we leverage state-of-the-art (SOTA) generative models for building powerful image priors.
Our method, called Bayesian Reconstruction through Generative Models (BRGM), uses a single pre-trained generator model to solve different image restoration tasks.
arXiv Detail & Related papers (2020-12-08T17:11:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.