KeyGS: A Keyframe-Centric Gaussian Splatting Method for Monocular Image Sequences
- URL: http://arxiv.org/abs/2412.20767v1
- Date: Mon, 30 Dec 2024 07:32:35 GMT
- Title: KeyGS: A Keyframe-Centric Gaussian Splatting Method for Monocular Image Sequences
- Authors: Keng-Wei Chang, Zi-Ming Wang, Shang-Hong Lai,
- Abstract summary: We present an efficient framework that operates without any depth or matching model.
We propose a coarse-to-fine frequency-aware densification to reconstruct different levels of details.
- Score: 14.792295042683254
- License:
- Abstract: Reconstructing high-quality 3D models from sparse 2D images has garnered significant attention in computer vision. Recently, 3D Gaussian Splatting (3DGS) has gained prominence due to its explicit representation with efficient training speed and real-time rendering capabilities. However, existing methods still heavily depend on accurate camera poses for reconstruction. Although some recent approaches attempt to train 3DGS models without the Structure-from-Motion (SfM) preprocessing from monocular video datasets, these methods suffer from prolonged training times, making them impractical for many applications. In this paper, we present an efficient framework that operates without any depth or matching model. Our approach initially uses SfM to quickly obtain rough camera poses within seconds, and then refines these poses by leveraging the dense representation in 3DGS. This framework effectively addresses the issue of long training times. Additionally, we integrate the densification process with joint refinement and propose a coarse-to-fine frequency-aware densification to reconstruct different levels of details. This approach prevents camera pose estimation from being trapped in local minima or drifting due to high-frequency signals. Our method significantly reduces training time from hours to minutes while achieving more accurate novel view synthesis and camera pose estimation compared to previous methods.
Related papers
- Deblur4DGS: 4D Gaussian Splatting from Blurry Monocular Video [64.38566659338751]
We propose the first 4D Gaussian Splatting framework to reconstruct a high-quality 4D model from blurry monocular video, named Deblur4DGS.
We introduce exposure regularization to avoid trivial solutions, as well as multi-frame and multi-resolution consistency ones to alleviate artifacts. Beyond novel-view, Deblur4DGS can be applied to improve blurry video from multiple perspectives, including deblurring, frame synthesis, and video stabilization.
arXiv Detail & Related papers (2024-12-09T12:02:11Z) - ZeroGS: Training 3D Gaussian Splatting from Unposed Images [62.34149221132978]
We propose ZeroGS to train 3DGS from hundreds of unposed and unordered images.
Our method leverages a pretrained foundation model as the neural scene representation.
Our method recovers more accurate camera poses than state-of-the-art pose-free NeRF/3DGS methods.
arXiv Detail & Related papers (2024-11-24T11:20:48Z) - Look Gauss, No Pose: Novel View Synthesis using Gaussian Splatting without Accurate Pose Initialization [11.418632671254564]
3D Gaussian Splatting has emerged as a powerful tool for fast and accurate novel-view synthesis from a set of posed input images.
We propose an extension to the 3D Gaussian Splatting framework by optimizing the extrinsic camera parameters with respect to photometric residuals.
We show results on real-world scenes and complex trajectories through simulated environments.
arXiv Detail & Related papers (2024-10-11T12:01:15Z) - LP-3DGS: Learning to Prune 3D Gaussian Splatting [71.97762528812187]
We propose learning-to-prune 3DGS, where a trainable binary mask is applied to the importance score that can find optimal pruning ratio automatically.
Experiments have shown that LP-3DGS consistently produces a good balance that is both efficient and high quality.
arXiv Detail & Related papers (2024-05-29T05:58:34Z) - EvaGaussians: Event Stream Assisted Gaussian Splatting from Blurry Images [36.91327728871551]
3D Gaussian Splatting (3D-GS) has demonstrated exceptional capabilities in 3D scene reconstruction and novel view synthesis.
We introduce Event Stream Assisted Gaussian Splatting (EvaGaussians), a novel approach that integrates event streams captured by an event camera to assist in reconstructing high-quality 3D-GS from blurry images.
arXiv Detail & Related papers (2024-05-29T04:59:27Z) - A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose [44.13819148680788]
We develop a novel construct-and-optimize method for sparse view synthesis without camera poses.
Specifically, we construct a solution by using monocular depth and projecting pixels back into the 3D world.
We demonstrate results on the Tanks and Temples and Static Hikes datasets with as few as three widely-spaced views.
arXiv Detail & Related papers (2024-05-06T17:36:44Z) - Bootstrap 3D Reconstructed Scenes from 3D Gaussian Splatting [10.06208115191838]
We present a bootstrapping method to enhance the rendering of novel views using trained 3D-GS.
Our results indicate that bootstrapping effectively reduces artifacts, as well as clear enhancements on the evaluation metrics.
arXiv Detail & Related papers (2024-04-29T12:57:05Z) - COLMAP-Free 3D Gaussian Splatting [88.420322646756]
We propose a novel method to perform novel view synthesis without any SfM preprocessing.
We process the input frames in a sequential manner and progressively grow the 3D Gaussians set by taking one input frame at a time.
Our method significantly improves over previous approaches in view synthesis and camera pose estimation under large motion changes.
arXiv Detail & Related papers (2023-12-12T18:39:52Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.