Related papers: VGGT-Motion: Motion-Aware Calibration-Free Monocular SLAM for Long-Range Consistency

VGGT-Motion: Motion-Aware Calibration-Free Monocular SLAM for Long-Range Consistency

URL: http://arxiv.org/abs/2602.05508v1
Date: Thu, 05 Feb 2026 10:07:11 GMT
Title: VGGT-Motion: Motion-Aware Calibration-Free Monocular SLAM for Long-Range Consistency
Authors: Zhuang Xiong, Chen Zhang, Qingshan Xu, Wenbing Tao,
Abstract summary: VGGT-Motion is a calibration-free SLAM system for efficient global consistency over kilometer-scale trajectories.<n>We first propose a motion-aware submap construction mechanism that uses optical flow to guide adaptive partitioning.<n>We then design an anchor-driven direct Sim(3) registration strategy.<n> Experiments show that VGGT-Motion markedly improves trajectory accuracy and efficiency.
Score: 28.71501560297241
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite recent progress in calibration-free monocular SLAM via 3D vision foundation models, scale drift remains severe on long sequences. Motion-agnostic partitioning breaks contextual coherence and causes zero-motion drift, while conventional geometric alignment is computationally expensive. To address these issues, we propose VGGT-Motion, a calibration-free SLAM system for efficient and robust global consistency over kilometer-scale trajectories. Specifically, we first propose a motion-aware submap construction mechanism that uses optical flow to guide adaptive partitioning, prune static redundancy, and encapsulate turns for stable local geometry. We then design an anchor-driven direct Sim(3) registration strategy. By exploiting context-balanced anchors, it achieves search-free, pixel-wise dense alignment and efficient loop closure without costly feature matching. Finally, a lightweight submap-level pose graph optimization enforces global consistency with linear complexity, enabling scalable long-range operation. Experiments show that VGGT-Motion markedly improves trajectory accuracy and efficiency, achieving state-of-the-art performance in zero-shot, long-range calibration-free monocular SLAM.

Related papers

Geometry OR Tracker: Universal Geometric Operating Room Tracking [61.399734016038614]
In operating rooms (OR), world-scale multi-view 3D tracking supports downstream applications such as surgeon behavior recognition.<n>Camera calibration and RGB-D registration are always unreliable, leading to cross-view geometric inconsistency.<n>We introduce Geometry OR Tracker, a two-stage pipeline that rectifies imprecise calibration into a scaleconsistent and geometrically consistent camera setup.
arXiv Detail & Related papers (2026-02-28T09:21:21Z)
From Tokens to Nodes: Semantic-Guided Motion Control for Dynamic 3D Gaussian Splatting [26.57713792657793]
We propose a motion-adaptive framework that aligns control density with motion complexity.<n>We show significant improvements in reconstruction quality and efficiency over existing state-of-the-art methods.
arXiv Detail & Related papers (2025-10-03T05:33:58Z)
WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance [17.295532380360992]
WorldForge is a training-free, inference-time framework composed of three tightly coupled modules.<n>Our framework is plug-and-play and model-agnostic, enabling broad applicability across various 3D/4D tasks.
arXiv Detail & Related papers (2025-09-18T16:40:47Z)
PMGS: Reconstruction of Projectile Motion across Large Spatiotemporal Spans via 3D Gaussian Splatting [9.314869696272297]
This study proposes PMGS, focusing on reconstructing Projectile via 3D Gaussian Splatting.<n>We introduce an acceleration constraint to bridge Newtonian mechanics and pose estimation, and design a dynamic simulated deformation strategy that adaptively schedules learning rates based on motion states.
arXiv Detail & Related papers (2025-08-04T17:49:37Z)
Motion Matters: Compact Gaussian Streaming for Free-Viewpoint Video Reconstruction [57.76758872762516]
3D Gaussian Splatting (3DGS) has emerged as a high-fidelity and efficient paradigm for online free-viewpoint video (FVV) reconstruction.<n>We propose a novel Compact Gaussian Streaming (ComGS) framework, leveraging the locality and consistency of motion in dynamic scene.<n>ComGS achieves a remarkable storage reduction of over 159 X compared to 3DGStream and 14 X compared to the SOTA method QUEEN.
arXiv Detail & Related papers (2025-05-22T11:22:09Z)
DIMM: Decoupled Multi-hierarchy Kalman Filter for 3D Object Tracking [50.038098341549095]
State estimation is challenging for 3D object tracking with high maneuverability.<n>We propose a novel framework, DIMM, to effectively combine estimates from different motion models in each direction.<n>DIMM significantly improves the tracking accuracy of existing state estimation methods by 31.61%99.23%.
arXiv Detail & Related papers (2025-05-18T10:12:41Z)
Steepest Descent Density Control for Compact 3D Gaussian Splatting [72.54055499344052]
3D Gaussian Splatting (3DGS) has emerged as a powerful real-time, high-resolution novel view.<n>We propose a theoretical framework that demystifies and improves density control in 3DGS.<n>We introduce SteepGS, incorporating steepest density control, a principled strategy that minimizes loss while maintaining a compact point cloud.
arXiv Detail & Related papers (2025-05-08T18:41:38Z)
Correspondence-Guided SfM-Free 3D Gaussian Splatting for NVS [52.3215552448623]
Novel View Synthesis (NVS) without Structure-from-Motion (SfM) pre-processed camera poses are crucial for promoting rapid response capabilities and enhancing robustness against variable operating conditions. Recent SfM-free methods have integrated pose optimization, designing end-to-end frameworks for joint camera pose estimation and NVS. Most existing works rely on per-pixel image loss functions, such as L2 loss. In this study, we propose a correspondence-guided SfM-free 3D Gaussian splatting for NVS.
arXiv Detail & Related papers (2024-08-16T13:11:22Z)
Improving Gaussian Splatting with Localized Points Management [52.009874685460694]
Localized Point Management (LPM) is capable of identifying those error-contributing zones in greatest need for both point addition and geometry calibration.<n>LPM applies point densification in the identified zones and then reset the opacity of the points in front of these regions, creating a new opportunity to correct poorly conditioned points.<n> Notably, LPM improves both static 3DGS and dynamic SpaceTimeGS to achieve state-of-the-art rendering quality while retaining real-time speeds.
arXiv Detail & Related papers (2024-06-06T16:55:07Z)
Improved Real-Time Monocular SLAM Using Semantic Segmentation on Selective Frames [15.455647477995312]
monocular simultaneous localization and mapping (SLAM) is emerging in advanced driver assistance systems and autonomous driving. This paper proposes an improved real-time monocular SLAM using deep learning-based semantic segmentation. Experiments with six video sequences demonstrate that the proposed monocular SLAM system achieves significantly more accurate trajectory tracking accuracy.
arXiv Detail & Related papers (2021-04-30T22:34:45Z)
Pushing the Envelope of Rotation Averaging for Visual SLAM [69.7375052440794]
We propose a novel optimization backbone for visual SLAM systems. We leverage averaging to improve the accuracy, efficiency and robustness of conventional monocular SLAM systems. Our approach can exhibit up to 10x faster with comparable accuracy against the state-art on public benchmarks.
arXiv Detail & Related papers (2020-11-02T18:02:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.