ViSTA-SLAM: Visual SLAM with Symmetric Two-view Association
- URL: http://arxiv.org/abs/2509.01584v1
- Date: Mon, 01 Sep 2025 16:12:23 GMT
- Title: ViSTA-SLAM: Visual SLAM with Symmetric Two-view Association
- Authors: Ganlin Zhang, Shenhan Qian, Xi Wang, Daniel Cremers,
- Abstract summary: ViSTA-SLAM is a real-time monocular visual SLAM system that operates without requiring camera closures.<n>Our approach achieves superior performance in both camera tracking and dense 3D reconstruction quality.
- Score: 52.34293412010292
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We present ViSTA-SLAM as a real-time monocular visual SLAM system that operates without requiring camera intrinsics, making it broadly applicable across diverse camera setups. At its core, the system employs a lightweight symmetric two-view association (STA) model as the frontend, which simultaneously estimates relative camera poses and regresses local pointmaps from only two RGB images. This design reduces model complexity significantly, the size of our frontend is only 35\% that of comparable state-of-the-art methods, while enhancing the quality of two-view constraints used in the pipeline. In the backend, we construct a specially designed Sim(3) pose graph that incorporates loop closures to address accumulated drift. Extensive experiments demonstrate that our approach achieves superior performance in both camera tracking and dense 3D reconstruction quality compared to current methods. Github repository: https://github.com/zhangganlin/vista-slam
Related papers
- 3AM: 3egment Anything with Geometric Consistency in Videos [32.069894075133305]
3AM is a training-time enhancement that integrates 3D-aware features from MUSt3R into SAM2.<n>Our method requires only RGB input at inference, with no camera poses or preprocessing.<n>On challenging datasets with wide-baseline motion (ScanNet++, Replica), 3AM substantially outperforms SAM2 and extensions.
arXiv Detail & Related papers (2026-01-13T18:59:54Z) - MCGS-SLAM: A Multi-Camera SLAM Framework Using Gaussian Splatting for High-Fidelity Mapping [52.99503784067417]
We present MCGS-SLAM, the first purely RGB-based multi-camera SLAM system built on 3D Gaussian Splatting (3DGS)<n>A multi-camera bundle adjustment (MCBA) jointly refines poses and depths via dense photometric and geometric residuals, while a scale consistency module enforces metric alignment across views.<n>Experiments on synthetic and real-world datasets show that MCGS-SLAM consistently yields accurate trajectories and photorealistic reconstructions.
arXiv Detail & Related papers (2025-09-17T17:27:53Z) - cuVSLAM: CUDA accelerated visual odometry and mapping [72.43057259584663]
cuVSLAM is a state-of-the-art solution for visual simultaneous localization and mapping.<n>It can operate with a variety of visual-inertial sensor suites, including multiple RGB and depth cameras, and inertial measurement units.
arXiv Detail & Related papers (2025-06-04T18:20:17Z) - SEGS-SLAM: Structure-enhanced 3D Gaussian Splatting SLAM with Appearance Embedding [11.512986158947733]
3D Gaussian splatting (3D-GS) has recently revolutionized novel view synthesis in the simultaneous localization and mapping problem.<n>We propose SEGS-SLAM, a structure-enhanced 3D Gaussian Splatting SLAM, which achieves high-quality photorealistic mapping.
arXiv Detail & Related papers (2025-01-09T13:50:26Z) - MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors [15.764342766592808]
We present a real-time monocular dense SLAM system designed bottom-up from MASt3R.<n>Our system is robust on in-the-wild video sequences despite making no assumption on a fixed or parametric camera model.
arXiv Detail & Related papers (2024-12-16T23:00:05Z) - SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos [33.57444419305241]
SLAM3R is a novel system for real-time, high-quality, dense 3D reconstruction using RGB videos.<n>It seamlessly integrates local 3D reconstruction and global coordinate registration through feed-forward neural networks.<n>It achieves state-of-the-art reconstruction accuracy and completeness while maintaining real-time performance at 20+ FPS.
arXiv Detail & Related papers (2024-12-12T16:08:03Z) - Dual-Camera Smooth Zoom on Mobile Phones [55.4114152554769]
We introduce a new task, ie, dual-camera smooth zoom (DCSZ) to achieve a smooth zoom preview.
The frame models (FI) technique is a potential solution but struggles with ground-truth collection.
We suggest a data factory solution where continuous virtual cameras are assembled to generate DCSZ data by rendering reconstructed 3D models of the scene.
arXiv Detail & Related papers (2024-04-07T10:28:01Z) - InstantSplat: Sparse-view Gaussian Splatting in Seconds [91.77050739918037]
We introduce InstantSplat, a novel approach for addressing sparse-view 3D scene reconstruction at lightning-fast speed.<n>InstantSplat employs a self-supervised framework that optimize 3D scene representation and camera poses.<n>It achieves an acceleration of over 30x in reconstruction and improves visual quality (SSIM) from 0.3755 to 0.7624 compared to traditional SfM with 3D-GS.
arXiv Detail & Related papers (2024-03-29T17:29:58Z) - Gaussian Splatting SLAM [16.3858380078553]
We present the first application of 3D Gaussian Splatting in monocular SLAM.
Our method runs live at 3fps, unifying the required representation for accurate tracking, mapping, and high-quality rendering.
Several innovations are required to continuously reconstruct 3D scenes with high fidelity from a live camera.
arXiv Detail & Related papers (2023-12-11T18:19:04Z) - Structure PLP-SLAM: Efficient Sparse Mapping and Localization using
Point, Line and Plane for Monocular, RGB-D and Stereo Cameras [13.693353009049773]
This paper demonstrates a visual SLAM system that utilizes point and line cloud for robust camera localization, simultaneously, with an embedded piece-wise planar reconstruction (PPR) module.
We address the challenge of reconstructing geometric primitives with scale ambiguity by proposing several run-time optimizations on the reconstructed lines and planes.
The results show that our proposed SLAM tightly incorporates the semantic features to boost both tracking as well as backend optimization.
arXiv Detail & Related papers (2022-07-13T09:05:35Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.