Q-SLAM: Quadric Representations for Monocular SLAM
- URL: http://arxiv.org/abs/2403.08125v1
- Date: Tue, 12 Mar 2024 23:27:30 GMT
- Title: Q-SLAM: Quadric Representations for Monocular SLAM
- Authors: Chensheng Peng, Chenfeng Xu, Yue Wang, Mingyu Ding, Heng Yang,
Masayoshi Tomizuka, Kurt Keutzer, Marco Pavone, Wei Zhan
- Abstract summary: Monocular SLAM has long grappled with the challenge of accurately modeling 3D geometries.
Recent advances in Neural Radiance Fields (NeRF)-based monocular SLAM have shown promise.
We propose a novel approach that reimagines volumetric representations through the lens of quadric forms.
- Score: 89.05457684629621
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Monocular SLAM has long grappled with the challenge of accurately modeling 3D
geometries. Recent advances in Neural Radiance Fields (NeRF)-based monocular
SLAM have shown promise, yet these methods typically focus on novel view
synthesis rather than precise 3D geometry modeling. This focus results in a
significant disconnect between NeRF applications, i.e., novel-view synthesis
and the requirements of SLAM. We identify that the gap results from the
volumetric representations used in NeRF, which are often dense and noisy. In
this study, we propose a novel approach that reimagines volumetric
representations through the lens of quadric forms. We posit that most scene
components can be effectively represented as quadric planes. Leveraging this
assumption, we reshape the volumetric representations with million of cubes by
several quadric planes, which leads to more accurate and efficient modeling of
3D scenes in SLAM contexts. Our method involves two key steps: First, we use
the quadric assumption to enhance coarse depth estimations obtained from
tracking modules, e.g., Droid-SLAM. This step alone significantly improves
depth estimation accuracy. Second, in the subsequent mapping phase, we diverge
from previous NeRF-based SLAM methods that distribute sampling points across
the entire volume space. Instead, we concentrate sampling points around quadric
planes and aggregate them using a novel quadric-decomposed Transformer.
Additionally, we introduce an end-to-end joint optimization strategy that
synchronizes pose estimation with 3D reconstruction.
Related papers
- LinPrim: Linear Primitives for Differentiable Volumetric Rendering [53.780682194322225]
We introduce two new scene representations based on linear primitives-octahedra and tetrahedra-both of which define homogeneous volumes bounded by triangular faces.
This formulation aligns naturally with standard mesh-based tools, minimizing overhead for downstream applications.
We demonstrate comparable performance to state-of-the-art volumetric methods while requiring fewer primitives to achieve similar reconstruction fidelity.
arXiv Detail & Related papers (2025-01-27T18:49:38Z) - Mode-GS: Monocular Depth Guided Anchored 3D Gaussian Splatting for Robust Ground-View Scene Rendering [47.879695094904015]
We present a novelview rendering algorithm, Mode-GS, for ground-robot trajectory datasets.
Our approach is based on using anchored Gaussian splats, which are designed to overcome the limitations of existing 3D Gaussian splatting algorithms.
Our method results in improved rendering performance, based on PSNR, SSIM, and LPIPS metrics, in ground scenes with free trajectory patterns.
arXiv Detail & Related papers (2024-10-06T23:01:57Z) - Self-Evolving Depth-Supervised 3D Gaussian Splatting from Rendered Stereo Pairs [27.364205809607302]
3D Gaussian Splatting (GS) significantly struggles to accurately represent the underlying 3D scene geometry.
We address this limitation, undertaking a comprehensive analysis of the integration of depth priors throughout the optimization process.
This latter dynamically exploits depth cues from a readily available stereo network, processing virtual stereo pairs rendered by the GS model itself during training and achieving consistent self-improvement.
arXiv Detail & Related papers (2024-09-11T17:59:58Z) - Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis [11.236094544193605]
Conventional geometry-based SLAM systems lack dense 3D reconstruction capabilities.
We propose a real-time RGB-D SLAM system that incorporates a novel view synthesis technique, 3D Gaussian Splatting.
arXiv Detail & Related papers (2024-08-10T21:23:08Z) - ARAI-MVSNet: A multi-view stereo depth estimation network with adaptive
depth range and depth interval [19.28042366225802]
Multi-View Stereo(MVS) is a fundamental problem in geometric computer vision.
We present a novel multi-stage coarse-to-fine framework to achieve adaptive all-pixel depth range and depth interval.
Our model achieves state-of-the-art performance and yields competitive generalization ability.
arXiv Detail & Related papers (2023-08-17T14:52:11Z) - Multi-View Reconstruction using Signed Ray Distance Functions (SRDF) [22.75986869918975]
We investigate a new computational approach that builds on a novel shape representation that is volumetric.
The shape energy associated to this representation evaluates 3D geometry given color images and does not need appearance prediction.
In practice we propose an implicit shape representation, the SRDF, based on signed distances which we parameterize by depths along camera rays.
arXiv Detail & Related papers (2022-08-31T19:32:17Z) - Unbiased 4D: Monocular 4D Reconstruction with a Neural Deformation Model [76.64071133839862]
Capturing general deforming scenes from monocular RGB video is crucial for many computer graphics and vision applications.
Our method, Ub4D, handles large deformations, performs shape completion in occluded regions, and can operate on monocular RGB videos directly by using differentiable volume rendering.
Results on our new dataset, which will be made publicly available, demonstrate a clear improvement over the state of the art in terms of surface reconstruction accuracy and robustness to large deformations.
arXiv Detail & Related papers (2022-06-16T17:59:54Z) - 3DVNet: Multi-View Depth Prediction and Volumetric Refinement [68.68537312256144]
3DVNet is a novel multi-view stereo (MVS) depth-prediction method.
Our key idea is the use of a 3D scene-modeling network that iteratively updates a set of coarse depth predictions.
We show that our method exceeds state-of-the-art accuracy in both depth prediction and 3D reconstruction metrics.
arXiv Detail & Related papers (2021-12-01T00:52:42Z) - Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust
Depth Prediction [87.08227378010874]
We show the importance of the high-order 3D geometric constraints for depth prediction.
By designing a loss term that enforces a simple geometric constraint, we significantly improve the accuracy and robustness of monocular depth estimation.
We show state-of-the-art results of learning metric depth on NYU Depth-V2 and KITTI.
arXiv Detail & Related papers (2021-03-07T00:08:21Z) - Geometric Correspondence Fields: Learned Differentiable Rendering for 3D
Pose Refinement in the Wild [96.09941542587865]
We present a novel 3D pose refinement approach based on differentiable rendering for objects of arbitrary categories in the wild.
In this way, we precisely align 3D models to objects in RGB images which results in significantly improved 3D pose estimates.
We evaluate our approach on the challenging Pix3D dataset and achieve up to 55% relative improvement compared to state-of-the-art refinement methods in multiple metrics.
arXiv Detail & Related papers (2020-07-17T12:34:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.