LRM: Large Reconstruction Model for Single Image to 3D
- URL: http://arxiv.org/abs/2311.04400v2
- Date: Sat, 9 Mar 2024 10:47:51 GMT
- Title: LRM: Large Reconstruction Model for Single Image to 3D
- Authors: Yicong Hong and Kai Zhang and Jiuxiang Gu and Sai Bi and Yang Zhou and
Difan Liu and Feng Liu and Kalyan Sunkavalli and Trung Bui and Hao Tan
- Abstract summary: We propose the first Large Reconstruction Model (LRM) that predicts the 3D model of an object from a single input image within just 5 seconds.
LRM adopts a highly scalable transformer-based architecture with 500 million learnable parameters to directly predict a neural radiance field (NeRF) from the input image.
We train our model in an end-to-end manner on massive multi-view data containing around 1 million objects.
- Score: 61.47357798633123
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose the first Large Reconstruction Model (LRM) that predicts the 3D
model of an object from a single input image within just 5 seconds. In contrast
to many previous methods that are trained on small-scale datasets such as
ShapeNet in a category-specific fashion, LRM adopts a highly scalable
transformer-based architecture with 500 million learnable parameters to
directly predict a neural radiance field (NeRF) from the input image. We train
our model in an end-to-end manner on massive multi-view data containing around
1 million objects, including both synthetic renderings from Objaverse and real
captures from MVImgNet. This combination of a high-capacity model and
large-scale training data empowers our model to be highly generalizable and
produce high-quality 3D reconstructions from various testing inputs, including
real-world in-the-wild captures and images created by generative models. Video
demos and interactable 3D meshes can be found on our LRM project webpage:
https://yiconghong.me/LRM.
Related papers
- L4GM: Large 4D Gaussian Reconstruction Model [99.82220378522624]
We present L4GM, the first 4D Large Reconstruction Model that produces animated objects from a single-view video input.
Key to our success is a novel dataset of multiview videos containing curated, rendered animated objects.
arXiv Detail & Related papers (2024-06-14T17:51:18Z) - Real3D: Scaling Up Large Reconstruction Models with Real-World Images [34.735198125706326]
Real3D is the first LRM system that can be trained using single-view real-world images.
We propose two unsupervised losses that allow us to supervise LRMs at the pixel- and semantic-level.
We develop an automatic data curation approach to collect high-quality examples from in-the-wild images.
arXiv Detail & Related papers (2024-06-12T17:59:08Z) - M-LRM: Multi-view Large Reconstruction Model [37.46572626325514]
Multi-view Large Reconstruction Model (M-LRM) designed to efficiently reconstruct high-quality 3D shapes from multi-views in a 3D-aware manner.
Compared to Large Reconstruction Model, the proposed M-LRM can produce a tri-plane NeRF with $128 times 128$ resolution and generate 3D shapes of high fidelity.
arXiv Detail & Related papers (2024-06-11T18:29:13Z) - MVGamba: Unify 3D Content Generation as State Space Sequence Modeling [150.80564081817786]
We introduce MVGamba, a general and lightweight Gaussian reconstruction model featuring a multi-view Gaussian reconstructor.
With off-the-detail multi-view diffusion models integrated, MVGamba unifies 3D generation tasks from a single image, sparse images, or text prompts.
Experiments demonstrate that MVGamba outperforms state-of-the-art baselines in all 3D content generation scenarios with approximately only $0.1times$ of the model size.
arXiv Detail & Related papers (2024-06-10T15:26:48Z) - GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting [49.32327147931905]
We propose GS-LRM, a scalable large reconstruction model that can predict high-quality 3D Gaussians from 2-4 posed sparse images in 0.23 seconds on single A100 GPU.
Our model features a very simple transformer-based architecture; we patchify input posed images, pass the primitive multi-view image tokens through a sequence of transformer blocks, and decode final per-pixel Gaussian parameters directly from these tokens for differentiable rendering.
arXiv Detail & Related papers (2024-04-30T16:47:46Z) - InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models [66.83681825842135]
InstantMesh is a feed-forward framework for instant 3D mesh generation from a single image.
It features state-of-the-art generation quality and significant training scalability.
We release all the code, weights, and demo of InstantMesh with the intention that it can make substantial contributions to the community of 3D generative AI.
arXiv Detail & Related papers (2024-04-10T17:48:37Z) - VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models [20.084928490309313]
This paper presents a novel method for building scalable 3D generative models utilizing pre-trained video diffusion models.
By unlocking its multi-view generative capabilities through fine-tuning, we generate a large-scale synthetic multi-view dataset to train a feed-forward 3D generative model.
The proposed model, VFusion3D, trained on nearly 3M synthetic multi-view data, can generate a 3D asset from a single image in seconds.
arXiv Detail & Related papers (2024-03-18T17:59:12Z) - PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape
Prediction [77.89935657608926]
We propose a Pose-Free Large Reconstruction Model (PF-LRM) for reconstructing a 3D object from a few unposed images.
PF-LRM simultaneously estimates the relative camera poses in 1.3 seconds on a single A100 GPU.
arXiv Detail & Related papers (2023-11-20T18:57:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.