MASt3R-Fusion: Integrating Feed-Forward Visual Model with IMU, GNSS for High-Functionality SLAM
- URL: http://arxiv.org/abs/2509.20757v2
- Date: Mon, 29 Sep 2025 00:40:50 GMT
- Title: MASt3R-Fusion: Integrating Feed-Forward Visual Model with IMU, GNSS for High-Functionality SLAM
- Authors: Yuxuan Zhou, Xingxing Li, Shengyu Li, Zhuohao Yan, Chunxi Xia, Shaoquan Feng,
- Abstract summary: We propose MASt3R-Fusion, a multi-sensor-assisted visual SLAM framework that integrates feed-forward pointmap regression with complementary sensor information.<n>A hierarchical factor graph design is developed, which allows both real-time sliding-window optimization and global optimization with aggressive loop closures.<n>We evaluate our approach on both public benchmarks and self-collected datasets, demonstrating substantial improvements in accuracy and robustness.
- Score: 12.158063913401575
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual SLAM is a cornerstone technique in robotics, autonomous driving and extended reality (XR), yet classical systems often struggle with low-texture environments, scale ambiguity, and degraded performance under challenging visual conditions. Recent advancements in feed-forward neural network-based pointmap regression have demonstrated the potential to recover high-fidelity 3D scene geometry directly from images, leveraging learned spatial priors to overcome limitations of traditional multi-view geometry methods. However, the widely validated advantages of probabilistic multi-sensor information fusion are often discarded in these pipelines. In this work, we propose MASt3R-Fusion,a multi-sensor-assisted visual SLAM framework that tightly integrates feed-forward pointmap regression with complementary sensor information, including inertial measurements and GNSS data. The system introduces Sim(3)-based visualalignment constraints (in the Hessian form) into a universal metric-scale SE(3) factor graph for effective information fusion. A hierarchical factor graph design is developed, which allows both real-time sliding-window optimization and global optimization with aggressive loop closures, enabling real-time pose tracking, metric-scale structure perception and globally consistent mapping. We evaluate our approach on both public benchmarks and self-collected datasets, demonstrating substantial improvements in accuracy and robustness over existing visual-centered multi-sensor SLAM systems. The code will be released open-source to support reproducibility and further research (https://github.com/GREAT-WHU/MASt3R-Fusion).
Related papers
- VIGS-SLAM: Visual Inertial Gaussian Splatting SLAM [75.55522219717137]
We present VIGS-SLAM, a visual-inertial 3D Gaussian Splatting SLAM system.<n>It achieves robust real-time tracking and high-fidelity reconstruction.<n>Our method tightly couples visual and inertial cues within a unified optimization framework.
arXiv Detail & Related papers (2025-12-02T00:19:13Z) - EGG-Fusion: Efficient 3D Reconstruction with Geometry-aware Gaussian Surfel on the Fly [8.803716785929936]
EGG-Fusion is a novel differentiable-rendering-based real-time reconstruction system.<n>The proposed system achieves a surface reconstruction error of 0.6textitcm, representing over 20% improvement in accuracy compared to state-of-the-art methods.<n> Notably, the system maintains real-time processing capabilities at 24 FPS, establishing it as one of the most accurate differentiable-rendering-based real-time reconstruction systems.
arXiv Detail & Related papers (2025-12-01T05:32:17Z) - SING3R-SLAM: Submap-based Indoor Monocular Gaussian SLAM with 3D Reconstruction Priors [80.51557267896938]
SING3R-SLAM is a globally consistent and compact Gaussian-based dense RGB SLAM framework.<n>We show that SING3R-SLAM achieves state-of-the-art tracking, 3D reconstruction, and novel view rendering, resulting in over 12% improvement in tracking and producing finer, more detailed geometry.
arXiv Detail & Related papers (2025-11-21T12:40:55Z) - RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation [75.61028930882144]
We identify and quantify this critical issue, demonstrating a significant performance gap in 3D object detection when using synthetic versus real data.<n>We introduce Reinforcement Learning with Geometric Feedback (RLGF), RLGF uniquely refines video diffusion models by incorporating rewards from specialized latent-space AD perception models.<n> RLGF substantially reduces geometric errors (e.g., VP error by 21%, Depth error by 57%) and dramatically improves 3D object detection mAP by 12.7%, narrowing the gap to real-data performance.
arXiv Detail & Related papers (2025-09-20T02:23:36Z) - MCGS-SLAM: A Multi-Camera SLAM Framework Using Gaussian Splatting for High-Fidelity Mapping [52.99503784067417]
We present MCGS-SLAM, the first purely RGB-based multi-camera SLAM system built on 3D Gaussian Splatting (3DGS)<n>A multi-camera bundle adjustment (MCBA) jointly refines poses and depths via dense photometric and geometric residuals, while a scale consistency module enforces metric alignment across views.<n>Experiments on synthetic and real-world datasets show that MCGS-SLAM consistently yields accurate trajectories and photorealistic reconstructions.
arXiv Detail & Related papers (2025-09-17T17:27:53Z) - Pseudo Depth Meets Gaussian: A Feed-forward RGB SLAM Baseline [64.42938561167402]
We propose an online 3D reconstruction method using 3D Gaussian-based SLAM, combined with a feed-forward recurrent prediction module.<n>This approach replaces slow test-time optimization with fast network inference, significantly improving tracking speed.<n>Our method achieves performance on par with the state-of-the-art SplaTAM, while reducing tracking time by more than 90%.
arXiv Detail & Related papers (2025-08-06T16:16:58Z) - GSFF-SLAM: 3D Semantic Gaussian Splatting SLAM via Feature Field [17.57215792490409]
GSFF-SLAM is a novel dense semantic SLAM system based on 3D Gaussian Splatting.<n>Our method supports semantic reconstruction using various forms of 2D priors, particularly sparse and noisy signals.<n>When utilizing 2D ground truth priors, GSFF-SLAM achieves state-of-the-art semantic segmentation performance with 95.03% mIoU.
arXiv Detail & Related papers (2025-04-28T01:21:35Z) - GeoFlow-SLAM: A Robust Tightly-Coupled RGBD-Inertial and Legged Odometry Fusion SLAM for Dynamic Legged Robotics [12.041115472752594]
GeoFlow-SLAM is a robust and effective Tightly-Coupled RGBD-inertial SLAM for robotics undergoing aggressive and high-frequency motions.<n>Our method addresses three critical challenges:feature matching and visual feature failures in texture-less scenes.<n>The proposed algorithms achieve state-of-the-art (SOTA) on collected legged robots and open-source datasets.
arXiv Detail & Related papers (2025-03-18T13:35:49Z) - GS-LIVO: Real-Time LiDAR, Inertial, and Visual Multi-sensor Fused Odometry with Gaussian Mapping [22.432252084121274]
LiDAR-Inertial-Visual (LIV) sensor configuration has demonstrated superior performance in localization and dense mapping.<n>We propose a novel real-time Gaussian-based simultaneous localization and mapping (SLAM) system.<n>The framework achieves real-time performance while maintaining robust multi-sensor fusion capabilities.
arXiv Detail & Related papers (2025-01-15T09:04:56Z) - PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting [54.7468067660037]
PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices.<n>Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS.
arXiv Detail & Related papers (2024-10-29T15:28:15Z) - MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements [59.70107451308687]
We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM.
Our method, MM3DGS, addresses the limitations of prior rendering by enabling faster scale awareness, and improved trajectory tracking.
We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit.
arXiv Detail & Related papers (2024-04-01T04:57:41Z) - DBA-Fusion: Tightly Integrating Deep Dense Visual Bundle Adjustment with Multiple Sensors for Large-Scale Localization and Mapping [3.5047603107971397]
We tightly integrate the trainable deep dense bundle adjustment (DBA) with multi-sensor information through a factor graph.
A pipeline for visual-inertial integration is firstly developed, which provides the minimum ability of metric-scale localization and mapping.
The results validate the superior localization performance of our approach, which enables real-time dense mapping in large-scale environments.
arXiv Detail & Related papers (2024-03-20T16:20:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.