Related papers: Q-SLAM: Quadric Representations for Monocular SLAM

Q-SLAM: Quadric Representations for Monocular SLAM

URL: http://arxiv.org/abs/2403.08125v1
Date: Tue, 12 Mar 2024 23:27:30 GMT
Title: Q-SLAM: Quadric Representations for Monocular SLAM
Authors: Chensheng Peng, Chenfeng Xu, Yue Wang, Mingyu Ding, Heng Yang, Masayoshi Tomizuka, Kurt Keutzer, Marco Pavone, Wei Zhan
Abstract summary: Monocular SLAM has long grappled with the challenge of accurately modeling 3D geometries. Recent advances in Neural Radiance Fields (NeRF)-based monocular SLAM have shown promise. We propose a novel approach that reimagines volumetric representations through the lens of quadric forms.
Score: 89.05457684629621
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Monocular SLAM has long grappled with the challenge of accurately modeling 3D geometries. Recent advances in Neural Radiance Fields (NeRF)-based monocular SLAM have shown promise, yet these methods typically focus on novel view synthesis rather than precise 3D geometry modeling. This focus results in a significant disconnect between NeRF applications, i.e., novel-view synthesis and the requirements of SLAM. We identify that the gap results from the volumetric representations used in NeRF, which are often dense and noisy. In this study, we propose a novel approach that reimagines volumetric representations through the lens of quadric forms. We posit that most scene components can be effectively represented as quadric planes. Leveraging this assumption, we reshape the volumetric representations with million of cubes by several quadric planes, which leads to more accurate and efficient modeling of 3D scenes in SLAM contexts. Our method involves two key steps: First, we use the quadric assumption to enhance coarse depth estimations obtained from tracking modules, e.g., Droid-SLAM. This step alone significantly improves depth estimation accuracy. Second, in the subsequent mapping phase, we diverge from previous NeRF-based SLAM methods that distribute sampling points across the entire volume space. Instead, we concentrate sampling points around quadric planes and aggregate them using a novel quadric-decomposed Transformer. Additionally, we introduce an end-to-end joint optimization strategy that synchronizes pose estimation with 3D reconstruction.

Related papers

Pseudo Depth Meets Gaussian: A Feed-forward RGB SLAM Baseline [64.42938561167402]
We propose an online 3D reconstruction method using 3D Gaussian-based SLAM, combined with a feed-forward recurrent prediction module.<n>This approach replaces slow test-time optimization with fast network inference, significantly improving tracking speed.<n>Our method achieves performance on par with the state-of-the-art SplaTAM, while reducing tracking time by more than 90%.
arXiv Detail & Related papers (2025-08-06T16:16:58Z)
LinPrim: Linear Primitives for Differentiable Volumetric Rendering [53.780682194322225]
We introduce two new scene representations based on linear primitives. We present a different octaiableizer that runs efficiently on GPU. We demonstrate comparable performance to state-of-the-art methods.
arXiv Detail & Related papers (2025-01-27T18:49:38Z)
PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting [54.7468067660037]
PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices.<n>Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS.
arXiv Detail & Related papers (2024-10-29T15:28:15Z)
Mode-GS: Monocular Depth Guided Anchored 3D Gaussian Splatting for Robust Ground-View Scene Rendering [47.879695094904015]
We present a novelview rendering algorithm, Mode-GS, for ground-robot trajectory datasets. Our approach is based on using anchored Gaussian splats, which are designed to overcome the limitations of existing 3D Gaussian splatting algorithms. Our method results in improved rendering performance, based on PSNR, SSIM, and LPIPS metrics, in ground scenes with free trajectory patterns.
arXiv Detail & Related papers (2024-10-06T23:01:57Z)
Self-Evolving Depth-Supervised 3D Gaussian Splatting from Rendered Stereo Pairs [27.364205809607302]
3D Gaussian Splatting (GS) significantly struggles to accurately represent the underlying 3D scene geometry. We address this limitation, undertaking a comprehensive analysis of the integration of depth priors throughout the optimization process. This latter dynamically exploits depth cues from a readily available stereo network, processing virtual stereo pairs rendered by the GS model itself during training and achieving consistent self-improvement.
arXiv Detail & Related papers (2024-09-11T17:59:58Z)
Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis [11.236094544193605]
Conventional geometry-based SLAM systems lack dense 3D reconstruction capabilities. We propose a real-time RGB-D SLAM system that incorporates a novel view synthesis technique, 3D Gaussian Splatting.
arXiv Detail & Related papers (2024-08-10T21:23:08Z)
ARAI-MVSNet: A multi-view stereo depth estimation network with adaptive depth range and depth interval [19.28042366225802]
Multi-View Stereo(MVS) is a fundamental problem in geometric computer vision. We present a novel multi-stage coarse-to-fine framework to achieve adaptive all-pixel depth range and depth interval. Our model achieves state-of-the-art performance and yields competitive generalization ability.
arXiv Detail & Related papers (2023-08-17T14:52:11Z)
Multi-View Reconstruction using Signed Ray Distance Functions (SRDF) [22.75986869918975]
We investigate a new computational approach that builds on a novel shape representation that is volumetric. The shape energy associated to this representation evaluates 3D geometry given color images and does not need appearance prediction. In practice we propose an implicit shape representation, the SRDF, based on signed distances which we parameterize by depths along camera rays.
arXiv Detail & Related papers (2022-08-31T19:32:17Z)
Unbiased 4D: Monocular 4D Reconstruction with a Neural Deformation Model [76.64071133839862]
Capturing general deforming scenes from monocular RGB video is crucial for many computer graphics and vision applications. Our method, Ub4D, handles large deformations, performs shape completion in occluded regions, and can operate on monocular RGB videos directly by using differentiable volume rendering. Results on our new dataset, which will be made publicly available, demonstrate a clear improvement over the state of the art in terms of surface reconstruction accuracy and robustness to large deformations.
arXiv Detail & Related papers (2022-06-16T17:59:54Z)
3DVNet: Multi-View Depth Prediction and Volumetric Refinement [68.68537312256144]
3DVNet is a novel multi-view stereo (MVS) depth-prediction method. Our key idea is the use of a 3D scene-modeling network that iteratively updates a set of coarse depth predictions. We show that our method exceeds state-of-the-art accuracy in both depth prediction and 3D reconstruction metrics.
arXiv Detail & Related papers (2021-12-01T00:52:42Z)
Volume Rendering of Neural Implicit Surfaces [57.802056954935495]
This paper aims to improve geometry representation and reconstruction in neural volume rendering. We achieve that by modeling the volume density as a function of the geometry. Applying this new density representation to challenging scene multiview datasets produced high quality geometry reconstructions.
arXiv Detail & Related papers (2021-06-22T20:23:16Z)
Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust Depth Prediction [87.08227378010874]
We show the importance of the high-order 3D geometric constraints for depth prediction. By designing a loss term that enforces a simple geometric constraint, we significantly improve the accuracy and robustness of monocular depth estimation. We show state-of-the-art results of learning metric depth on NYU Depth-V2 and KITTI.
arXiv Detail & Related papers (2021-03-07T00:08:21Z)
Geometric Correspondence Fields: Learned Differentiable Rendering for 3D Pose Refinement in the Wild [96.09941542587865]
We present a novel 3D pose refinement approach based on differentiable rendering for objects of arbitrary categories in the wild. In this way, we precisely align 3D models to objects in RGB images which results in significantly improved 3D pose estimates. We evaluate our approach on the challenging Pix3D dataset and achieve up to 55% relative improvement compared to state-of-the-art refinement methods in multiple metrics.
arXiv Detail & Related papers (2020-07-17T12:34:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.