InstantSplat: Sparse-view SfM-free Gaussian Splatting in Seconds
- URL: http://arxiv.org/abs/2403.20309v4
- Date: Tue, 17 Dec 2024 18:59:07 GMT
- Title: InstantSplat: Sparse-view SfM-free Gaussian Splatting in Seconds
- Authors: Zhiwen Fan, Kairun Wen, Wenyan Cong, Kevin Wang, Jian Zhang, Xinghao Ding, Danfei Xu, Boris Ivanovic, Marco Pavone, Georgios Pavlakos, Zhangyang Wang, Yue Wang,
- Abstract summary: We introduce InstantSplat, a novel and lightning-fast neural reconstruction system that builds accurate 3D representations from as few as 2-3 images.
InstantSplat integrates dense stereo priors and co-visibility relationships between frames to initialize pixel-aligned by progressively expanding the scene.
It achieves an acceleration of over 20 times in reconstruction, improves visual quality (SSIM) from 0.3755 to 0.7624 than COLMAP with 3D-GS, and is compatible with multiple 3D representations.
- Score: 91.77050739918037
- License:
- Abstract: While neural 3D reconstruction has advanced substantially, it typically requires densely captured multi-view data with carefully initialized poses (e.g., using COLMAP). However, this requirement limits its broader applicability, as Structure-from-Motion (SfM) is often unreliable in sparse-view scenarios where feature matches are limited, resulting in cumulative errors. In this paper, we introduce InstantSplat, a novel and lightning-fast neural reconstruction system that builds accurate 3D representations from as few as 2-3 images. InstantSplat adopts a self-supervised framework that bridges the gap between 2D images and 3D representations using Gaussian Bundle Adjustment (GauBA) and can be optimized in an end-to-end manner. InstantSplat integrates dense stereo priors and co-visibility relationships between frames to initialize pixel-aligned geometry by progressively expanding the scene avoiding redundancy. Gaussian Bundle Adjustment is used to adapt both the scene representation and camera parameters quickly by minimizing gradient-based photometric error. Overall, InstantSplat achieves large-scale 3D reconstruction in mere seconds by reducing the required number of input views. It achieves an acceleration of over 20 times in reconstruction, improves visual quality (SSIM) from 0.3755 to 0.7624 than COLMAP with 3D-GS, and is compatible with multiple 3D representations (3D-GS, 2D-GS, and Mip-Splatting).
Related papers
- GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views [67.34073368933814]
We propose a generalizable Gaussian Splatting approach for high-resolution image rendering under a sparse-view camera setting.
We train our Gaussian parameter regression module on human-only data or human-scene data, jointly with a depth estimation module to lift 2D parameter maps to 3D space.
Experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed.
arXiv Detail & Related papers (2024-11-18T08:18:44Z) - No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images.
Our model achieves real-time 3D Gaussian reconstruction during inference.
This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z) - Large Spatial Model: End-to-end Unposed Images to Semantic 3D [79.94479633598102]
Large Spatial Model (LSM) processes unposed RGB images directly into semantic radiance fields.
LSM simultaneously estimates geometry, appearance, and semantics in a single feed-forward operation.
It can generate versatile label maps by interacting with language at novel viewpoints.
arXiv Detail & Related papers (2024-10-24T17:54:42Z) - Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs [29.669534899109028]
We introduce Splatt3R, a pose-free, feed-forward method for in-the-wild 3D reconstruction and novel view synthesis from stereo pairs.
Given uncalibrated natural images, Splatt3R can predict 3D Gaussian Splats without requiring any camera parameters or depth information.
Splatt3R can reconstruct scenes at 4FPS at 512 x 512 resolution, and the resultant splats can be rendered in real-time.
arXiv Detail & Related papers (2024-08-25T18:27:20Z) - Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis [11.236094544193605]
Conventional geometry-based SLAM systems lack dense 3D reconstruction capabilities.
We propose a real-time RGB-D SLAM system that incorporates a novel view synthesis technique, 3D Gaussian Splatting.
arXiv Detail & Related papers (2024-08-10T21:23:08Z) - PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting [59.277480452459315]
We propose a principled sensitivity pruning score that preserves visual fidelity and foreground details at significantly higher compression ratios.
We also propose a multi-round prune-refine pipeline that can be applied to any pretrained 3D-GS model without changing its training pipeline.
arXiv Detail & Related papers (2024-06-14T17:53:55Z) - LP-3DGS: Learning to Prune 3D Gaussian Splatting [71.97762528812187]
We propose learning-to-prune 3DGS, where a trainable binary mask is applied to the importance score that can find optimal pruning ratio automatically.
Experiments have shown that LP-3DGS consistently produces a good balance that is both efficient and high quality.
arXiv Detail & Related papers (2024-05-29T05:58:34Z) - 2L3: Lifting Imperfect Generated 2D Images into Accurate 3D [16.66666619143761]
Multi-view (MV) 3D reconstruction is a promising solution to fuse generated MV images into consistent 3D objects.
However, the generated images usually suffer from inconsistent lighting, misaligned geometry, and sparse views, leading to poor reconstruction quality.
We present a novel 3D reconstruction framework that leverages intrinsic decomposition guidance, transient-mono prior guidance, and view augmentation to cope with the three issues.
arXiv Detail & Related papers (2024-01-29T02:30:31Z) - pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction [26.72289913260324]
pixelSplat is a feed-forward model that learns to reconstruct 3D radiance fields parameterized by 3D Gaussian primitives from pairs of images.
Our model features real-time and memory-efficient rendering for scalable training as well as fast 3D reconstruction at inference time.
arXiv Detail & Related papers (2023-12-19T17:03:50Z) - Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D
Reconstruction with Transformers [37.14235383028582]
We introduce a novel approach for single-view reconstruction that efficiently generates a 3D model from a single image via feed-forward inference.
Our method utilizes two transformer-based networks, namely a point decoder and a triplane decoder, to reconstruct 3D objects using a hybrid Triplane-Gaussian intermediate representation.
arXiv Detail & Related papers (2023-12-14T17:18:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.