Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D
Reconstruction with Transformers
- URL: http://arxiv.org/abs/2312.09147v2
- Date: Sat, 16 Dec 2023 04:31:20 GMT
- Title: Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D
Reconstruction with Transformers
- Authors: Zi-Xin Zou, Zhipeng Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang,
Yan-Pei Cao and Song-Hai Zhang
- Abstract summary: We introduce a novel approach for single-view reconstruction that efficiently generates a 3D model from a single image via feed-forward inference.
Our method utilizes two transformer-based networks, namely a point decoder and a triplane decoder, to reconstruct 3D objects using a hybrid Triplane-Gaussian intermediate representation.
- Score: 37.14235383028582
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in 3D reconstruction from single images have been driven
by the evolution of generative models. Prominent among these are methods based
on Score Distillation Sampling (SDS) and the adaptation of diffusion models in
the 3D domain. Despite their progress, these techniques often face limitations
due to slow optimization or rendering processes, leading to extensive training
and optimization times. In this paper, we introduce a novel approach for
single-view reconstruction that efficiently generates a 3D model from a single
image via feed-forward inference. Our method utilizes two transformer-based
networks, namely a point decoder and a triplane decoder, to reconstruct 3D
objects using a hybrid Triplane-Gaussian intermediate representation. This
hybrid representation strikes a balance, achieving a faster rendering speed
compared to implicit representations while simultaneously delivering superior
rendering quality than explicit representations. The point decoder is designed
for generating point clouds from single images, offering an explicit
representation which is then utilized by the triplane decoder to query Gaussian
features for each point. This design choice addresses the challenges associated
with directly regressing explicit 3D Gaussian attributes characterized by their
non-structural nature. Subsequently, the 3D Gaussians are decoded by an MLP to
enable rapid rendering through splatting. Both decoders are built upon a
scalable, transformer-based architecture and have been efficiently trained on
large-scale 3D datasets. The evaluations conducted on both synthetic datasets
and real-world images demonstrate that our method not only achieves higher
quality but also ensures a faster runtime in comparison to previous
state-of-the-art techniques. Please see our project page at
https://zouzx.github.io/TriplaneGaussian/.
Related papers
- GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction [52.04103235260539]
We present a diffusion model approach based on Gaussian Splatting representation for 3D object reconstruction from a single view.
The model learns to generate 3D objects represented by sets of GS ellipsoids.
The final reconstructed objects explicitly come with high-quality 3D structure and texture, and can be efficiently rendered in arbitrary views.
arXiv Detail & Related papers (2024-07-05T03:43:08Z) - Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction [153.52406455209538]
Gamba is an end-to-end 3D reconstruction model from a single-view image.
It completes reconstruction within 0.05 seconds on a single NVIDIA A100 GPU.
arXiv Detail & Related papers (2024-03-27T17:40:14Z) - latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction [48.86083272054711]
latentSplat is a method to predict semantic Gaussians in a 3D latent space that can be splatted and decoded by a light-weight generative 2D architecture.
We show that latentSplat outperforms previous works in reconstruction quality and generalization, while being fast and scalable to high-resolution data.
arXiv Detail & Related papers (2024-03-24T20:48:36Z) - AGG: Amortized Generative 3D Gaussians for Single Image to 3D [108.38567665695027]
We introduce an Amortized Generative 3D Gaussian framework (AGG) that instantly produces 3D Gaussians from a single image.
AGG decomposes the generation of 3D Gaussian locations and other appearance attributes for joint optimization.
We propose a cascaded pipeline that first generates a coarse representation of the 3D data and later upsamples it with a 3D Gaussian super-resolution module.
arXiv Detail & Related papers (2024-01-08T18:56:33Z) - Splatter Image: Ultra-Fast Single-View 3D Reconstruction [67.96212093828179]
Splatter Image is based on Gaussian Splatting, which allows fast and high-quality reconstruction of 3D scenes from multiple images.
We learn a neural network that, at test time, performs reconstruction in a feed-forward manner, at 38 FPS.
On several synthetic, real, multi-category and large-scale benchmark datasets, we achieve better results in terms of PSNR, LPIPS, and other metrics while training and evaluating much faster than prior works.
arXiv Detail & Related papers (2023-12-20T16:14:58Z) - pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction [26.72289913260324]
pixelSplat is a feed-forward model that learns to reconstruct 3D radiance fields parameterized by 3D Gaussian primitives from pairs of images.
Our model features real-time and memory-efficient rendering for scalable training as well as fast 3D reconstruction at inference time.
arXiv Detail & Related papers (2023-12-19T17:03:50Z) - Real-Time Radiance Fields for Single-Image Portrait View Synthesis [85.32826349697972]
We present a one-shot method to infer and render a 3D representation from a single unposed image in real-time.
Given a single RGB input, our image encoder directly predicts a canonical triplane representation of a neural radiance field for 3D-aware novel view synthesis via volume rendering.
Our method is fast (24 fps) on consumer hardware, and produces higher quality results than strong GAN-inversion baselines that require test-time optimization.
arXiv Detail & Related papers (2023-05-03T17:56:01Z) - TriPlaneNet: An Encoder for EG3D Inversion [1.9567015559455132]
NeRF-based GANs have introduced a number of approaches for high-resolution and high-fidelity generative modeling of human heads.
Despite the success of universal optimization-based methods for 2D GAN inversion, those applied to 3D GANs may fail to extrapolate the result onto the novel view.
We introduce a fast technique that bridges the gap between the two approaches by directly utilizing the tri-plane representation presented for the EG3D generative model.
arXiv Detail & Related papers (2023-03-23T17:56:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.