Dream-SLAM: Dreaming the Unseen for Active SLAM in Dynamic Environments
- URL: http://arxiv.org/abs/2602.21967v1
- Date: Wed, 25 Feb 2026 14:48:49 GMT
- Title: Dream-SLAM: Dreaming the Unseen for Active SLAM in Dynamic Environments
- Authors: Xiangqi Meng, Pengxu Hou, Zhenjun Zhao, Javier Civera, Daniel Cremers, Hesheng Wang, Haoang Li,
- Abstract summary: We propose a novel monocular active SLAM method, Dream-SLAM.<n>It is based on dreaming cross-spatio-temporal images and semantically plausible structures of partially observed dynamic environments.<n>Experiments on both public and self-collected datasets demonstrate that Dream-SLAM outperforms state-of-the-art methods in localization accuracy, mapping quality, and exploration efficiency.
- Score: 62.70468717776612
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In addition to the core tasks of simultaneous localization and mapping (SLAM), active SLAM additionally in- volves generating robot actions that enable effective and efficient exploration of unknown environments. However, existing active SLAM pipelines are limited by three main factors. First, they inherit the restrictions of the underlying SLAM modules that they may be using. Second, their motion planning strategies are typically shortsighted and lack long-term vision. Third, most approaches struggle to handle dynamic scenes. To address these limitations, we propose a novel monocular active SLAM method, Dream-SLAM, which is based on dreaming cross-spatio-temporal images and semantically plausible structures of partially observed dynamic environments. The generated cross-spatio-temporal im- ages are fused with real observations to mitigate noise and data incompleteness, leading to more accurate camera pose estimation and a more coherent 3D scene representation. Furthermore, we integrate dreamed and observed scene structures to enable long- horizon planning, producing farsighted trajectories that promote efficient and thorough exploration. Extensive experiments on both public and self-collected datasets demonstrate that Dream-SLAM outperforms state-of-the-art methods in localization accuracy, mapping quality, and exploration efficiency. Source code will be publicly available upon paper acceptance.
Related papers
- Bridge Thinking and Acting: Unleashing Physical Potential of VLM with Generalizable Action Expert [60.88976842557026]
Vision-Language Models (VLM) have demonstrated impressive planning and reasoning capabilities.<n>Recent dual-system approaches attempt to decouple "thinking" from "acting"<n>We introduce a framework centered around a generalizable action expert.
arXiv Detail & Related papers (2025-10-04T18:33:27Z) - Seeing Space and Motion: Enhancing Latent Actions with Spatial and Dynamic Awareness for VLA [21.362682837521632]
Latent Action Models (LAMs) enable Vision- Language-Action systems to learn semantic action rep- resentations from large-scale unannotated data.<n>We propose Farsighted-LAM, a latent action framework with geometry- aware spatial encoding and multi-scale temporal modeling.<n>We further propose SSM-VLA, an end- to-end VLA framework built upon Farsighted-LAM, which integrates structured perception with a visual Chain-of-Thought module.
arXiv Detail & Related papers (2025-09-30T13:41:43Z) - DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge [41.030494146004806]
We propose DreamVLA, a novel VLA framework that integrates comprehensive world knowledge forecasting to enable inverse dynamics modeling.<n>DreamVLA introduces a dynamic-region-guided world knowledge prediction, integrated with the spatial and semantic cues, which provide compact yet comprehensive representations for action planning.<n>Experiments on both real-world and simulation environments demonstrate that DreamVLA achieves 76.7% success rate on real robot tasks.
arXiv Detail & Related papers (2025-07-06T16:14:29Z) - MCN-SLAM: Multi-Agent Collaborative Neural SLAM with Hybrid Implicit Neural Scene Representation [51.07118703442774]
Existing NeRF-based multi-agent SLAM frameworks cannot meet the constraints of communication bandwidth.<n>We propose the first distributed multi-agent collaborative neural SLAM framework with hybrid scene representation.<n>A novel triplane-grid joint scene representation method is proposed to improve scene reconstruction.<n>A novel intra-to-inter loop closure method is designed to achieve local (single-agent) and global (multi-agent) consistency.
arXiv Detail & Related papers (2025-06-23T14:22:29Z) - MCOO-SLAM: A Multi-Camera Omnidirectional Object SLAM System [19.16370123474815]
We propose MCOO-SLAM, a novel Multi-Camera Omnidirectional Object SLAM system.<n>Our approach integrates point features and object-level landmarks enhanced with open-vocabulary semantics.<n>Extensive experiments in real-world demonstrate that MCOO-SLAM achieves accurate localization and scalable object-level mapping.
arXiv Detail & Related papers (2025-06-18T12:20:34Z) - LRSLAM: Low-rank Representation of Signed Distance Fields in Dense Visual SLAM System [10.484879583010466]
We propose a more efficient visual SLAM model, called LRSLAM, utilizing low-rank tensor decomposition methods.<n>Our approach achieves better convergence rates, memory efficiency, and reconstruction/localization quality than existing state-of-the-art approaches.
arXiv Detail & Related papers (2025-06-12T10:55:12Z) - Multimodal LLM Guided Exploration and Active Mapping using Fisher Information [33.19609196571658]
We present an active mapping system that plans for both long-horizon exploration goals and short-term actions using a 3D Gaussian Splatting (3DGS) representation.<n> Experiments conducted on the Gibson and Habitat-Matterport 3D datasets demonstrate state-of-the-art results of the proposed method.
arXiv Detail & Related papers (2024-10-22T20:51:45Z) - DNS SLAM: Dense Neural Semantic-Informed SLAM [92.39687553022605]
DNS SLAM is a novel neural RGB-D semantic SLAM approach featuring a hybrid representation.
Our method integrates multi-view geometry constraints with image-based feature extraction to improve appearance details.
Our experimental results achieve state-of-the-art performance on both synthetic data and real-world data tracking.
arXiv Detail & Related papers (2023-11-30T21:34:44Z) - Event-based Simultaneous Localization and Mapping: A Comprehensive Survey [52.73728442921428]
Review of event-based vSLAM algorithms that exploit the benefits of asynchronous and irregular event streams for localization and mapping tasks.
Paper categorizes event-based vSLAM methods into four main categories: feature-based, direct, motion-compensation, and deep learning methods.
arXiv Detail & Related papers (2023-04-19T16:21:14Z) - Det-SLAM: A semantic visual SLAM for highly dynamic scenes using
Detectron2 [0.0]
This research combines the visual SLAM systems ORB-SLAM3 and Detectron2 to present the Det-SLAM system.
Det-SLAM is more resilient than previous dynamic SLAM systems and can lower the estimated error of camera posture in dynamic indoor scenarios.
arXiv Detail & Related papers (2022-10-01T13:25:11Z) - NICE-SLAM: Neural Implicit Scalable Encoding for SLAM [112.6093688226293]
NICE-SLAM is a dense SLAM system that incorporates multi-level local information by introducing a hierarchical scene representation.
Compared to recent neural implicit SLAM systems, our approach is more scalable, efficient, and robust.
arXiv Detail & Related papers (2021-12-22T18:45:44Z) - DynaSLAM II: Tightly-Coupled Multi-Object Tracking and SLAM [2.9822184411723645]
DynaSLAM II is a visual SLAM system for stereo and RGB-D configurations that tightly integrates the multi-object tracking capability.
We demonstrate that tracking dynamic objects does not only provide rich clues for scene understanding but is also beneficial for camera tracking.
arXiv Detail & Related papers (2020-10-15T15:25:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.