MambaVO: Deep Visual Odometry Based on Sequential Matching Refinement and Training Smoothing
- URL: http://arxiv.org/abs/2412.20082v2
- Date: Mon, 07 Apr 2025 15:51:31 GMT
- Title: MambaVO: Deep Visual Odometry Based on Sequential Matching Refinement and Training Smoothing
- Authors: Shuo Wang, Wanting Li, Yongcai Wang, Zhaoxin Fan, Zhe Huang, Xudong Cai, Jian Zhao, Deying Li,
- Abstract summary: MambaVO conducts robust, Mamba-based matching and training to enhance the matching quality and improve the pose estimation.<n>On public benchmarks, MambaVO and MambaVO++ demonstrate SOTA performance, while ensuring real-time running.
- Score: 13.827464353174182
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep visual odometry has demonstrated great advancements by learning-to-optimize technology. This approach heavily relies on the visual matching across frames. However, ambiguous matching in challenging scenarios leads to significant errors in geometric modeling and bundle adjustment optimization, which undermines the accuracy and robustness of pose estimation. To address this challenge, this paper proposes MambaVO, which conducts robust initialization, Mamba-based sequential matching refinement, and smoothed training to enhance the matching quality and improve the pose estimation. Specifically, the new frame is matched with the closest keyframe in the maintained Point-Frame Graph (PFG) via the semi-dense based Geometric Initialization Module (GIM). Then the initialized PFG is processed by a proposed Geometric Mamba Module (GMM), which exploits the matching features to refine the overall inter-frame matching. The refined PFG is finally processed by differentiable BA to optimize the poses and the map. To deal with the gradient variance, a Trending-Aware Penalty (TAP) is proposed to smooth training and enhance convergence and stability. A loop closure module is finally applied to enable MambaVO++. On public benchmarks, MambaVO and MambaVO++ demonstrate SOTA performance, while ensuring real-time running.
Related papers
- FoundationSLAM: Unleashing the Power of Depth Foundation Models for End-to-End Dense Visual SLAM [50.9765003472032]
FoundationSLAM is a learning-based monocular dense SLAM system for accurate and robust tracking and mapping.<n>Our core idea is to bridge flow estimation with reasoning by leveraging the guidance from foundation depth models.
arXiv Detail & Related papers (2025-12-31T17:57:45Z) - Transformer-Progressive Mamba Network for Lightweight Image Super-Resolution [45.74812546007778]
Mamba-based super-resolution (SR) methods have demonstrated the ability to capture global receptive fields with linear complexity.<n>We propose T-PMambaSR, a lightweight SR framework that integrates window-based self-attention with Progressive Mamba.
arXiv Detail & Related papers (2025-11-05T06:46:17Z) - Sparse Deformable Mamba for Hyperspectral Image Classification [1.3471768511567523]
Mamba models significantly improve hyperspectral image (HSI) classification.
One critical challenge is the difficulty in building the sequence of Mamba tokens efficiently.
This paper presents a Sparse Deformable Mamba (SDMamba) approach for enhanced HSI classification.
arXiv Detail & Related papers (2025-04-13T06:08:19Z) - Feature Alignment with Equivariant Convolutions for Burst Image Super-Resolution [52.55429225242423]
We propose a novel framework for Burst Image Super-Resolution (BISR), featuring an equivariant convolution-based alignment.
This enables the alignment transformation to be learned via explicit supervision in the image domain and easily applied in the feature domain.
Experiments on BISR benchmarks show the superior performance of our approach in both quantitative metrics and visual quality.
arXiv Detail & Related papers (2025-03-11T11:13:10Z) - Relative Pose Estimation through Affine Corrections of Monocular Depth Priors [69.59216331861437]
We develop three solvers for relative pose estimation that explicitly account for independent affine (scale and shift) ambiguities.
We propose a hybrid estimation pipeline that combines our proposed solvers with classic point-based solvers and epipolar constraints.
arXiv Detail & Related papers (2025-01-09T18:58:30Z) - Detail Matters: Mamba-Inspired Joint Unfolding Network for Snapshot Spectral Compressive Imaging [40.80197280147993]
We propose a Mamba-inspired Joint Unfolding Network (MiJUN) to overcome the inherent nonlinear and ill-posed characteristics of HSI reconstruction.
We introduce an accelerated unfolding network scheme, which reduces the reliance on initial optimization stages.
We refine the scanning strategy with Mamba by integrating the tensor mode-$k$ unfolding into the Mamba network.
arXiv Detail & Related papers (2025-01-02T13:56:23Z) - SIGMA: Selective Gated Mamba for Sequential Recommendation [56.85338055215429]
Mamba, a recent advancement, has exhibited exceptional performance in time series prediction.<n>We introduce a new framework named Selective Gated Mamba ( SIGMA) for Sequential Recommendation.<n>Our results indicate that SIGMA outperforms current models on five real-world datasets.
arXiv Detail & Related papers (2024-08-21T09:12:59Z) - LLEMamba: Low-Light Enhancement via Relighting-Guided Mamba with Deep Unfolding Network [9.987504237289832]
We propose a novel Low-Light Enhancement method via relighting-guided Mamba with a deep unfolding network (LLEMamba)
Our LLEMamba first constructs a Retinex model with deep priors, embedding the iterative optimization process based on the Alternating Direction Method of Multipliers (ADMM) within a deep unfolding network.
Unlike Transformer, to assist the deep unfolding framework with multiple iterations, the proposed LLEMamba introduces a novel Mamba architecture with lower computational complexity.
arXiv Detail & Related papers (2024-06-03T06:23:28Z) - PyMAF-X: Towards Well-aligned Full-body Model Regression from Monocular
Images [60.33197938330409]
PyMAF-X is a regression-based approach to recovering parametric full-body models from monocular images.
PyMAF and PyMAF-X effectively improve the mesh-image alignment and achieve new state-of-the-art results.
arXiv Detail & Related papers (2022-07-13T17:58:33Z) - DFM: A Performance Baseline for Deep Feature Matching [10.014010310188821]
The proposed method uses pre-trained VGG architecture as a feature extractor and does not require any additional training specific to improve matching.
Our algorithm achieves 0.57 and 0.80 overall scores in terms of Mean Matching Accuracy (MMA) for 1 pixel and 2 pixels thresholds respectively on Hpatches dataset.
arXiv Detail & Related papers (2021-06-14T22:55:06Z) - An Adaptive Framework for Learning Unsupervised Depth Completion [59.17364202590475]
We present a method to infer a dense depth map from a color image and associated sparse depth measurements.
We show that regularization and co-visibility are related via the fitness of the model to data and can be unified into a single framework.
arXiv Detail & Related papers (2021-06-06T02:27:55Z) - Augmented Parallel-Pyramid Net for Attention Guided Pose-Estimation [90.28365183660438]
This paper proposes an augmented parallel-pyramid net with attention partial module and differentiable auto-data augmentation.
We define a new pose search space where the sequences of data augmentations are formulated as a trainable and operational CNN component.
Notably, our method achieves the top-1 accuracy on the challenging COCO keypoint benchmark and the state-of-the-art results on the MPII datasets.
arXiv Detail & Related papers (2020-03-17T03:52:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.