World Models for Autonomous Navigation of Terrestrial Robots from LIDAR Observations
- URL: http://arxiv.org/abs/2512.03429v1
- Date: Wed, 03 Dec 2025 04:15:31 GMT
- Title: World Models for Autonomous Navigation of Terrestrial Robots from LIDAR Observations
- Authors: Raul Steinmetz, Fabio Demo Rosa, Victor Augusto Kich, Jair Augusto Bottega, Ricardo Bedin Grando, Daniel Fernando Tello Gamarra,
- Abstract summary: This paper presents a novel model-based RL framework built on top of the DreamerV3 algorithm.<n>It integrates a Multi-Layer Perceptron Variational Autoencoder (MLP-VAE) within a world model to encode high-dimensional LIDAR readings into compact latent representations.<n>Experiments on simulated TurtleBot3 navigation tasks demonstrate that the proposed architecture achieves faster convergence and higher success rate.
- Score: 0.7239024032079358
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Autonomous navigation of terrestrial robots using Reinforcement Learning (RL) from LIDAR observations remains challenging due to the high dimensionality of sensor data and the sample inefficiency of model-free approaches. Conventional policy networks struggle to process full-resolution LIDAR inputs, forcing prior works to rely on simplified observations that reduce spatial awareness and navigation robustness. This paper presents a novel model-based RL framework built on top of the DreamerV3 algorithm, integrating a Multi-Layer Perceptron Variational Autoencoder (MLP-VAE) within a world model to encode high-dimensional LIDAR readings into compact latent representations. These latent features, combined with a learned dynamics predictor, enable efficient imagination-based policy optimization. Experiments on simulated TurtleBot3 navigation tasks demonstrate that the proposed architecture achieves faster convergence and higher success rate compared to model-free baselines such as SAC, DDPG, and TD3. It is worth emphasizing that the DreamerV3-based agent attains a 100% success rate across all evaluated environments when using the full dataset of the Turtlebot3 LIDAR (360 readings), while model-free methods plateaued below 85%. These findings demonstrate that integrating predictive world models with learned latent representations enables more efficient and robust navigation from high-dimensional sensory data.
Related papers
- RadioGen3D: 3D Radio Map Generation via Adversarial Learning on Large-Scale Synthetic Data [62.63849426834315]
Radio maps are essential for efficient radio resource management in future 6G and low-altitude networks.<n>Deep learning (DL) techniques have emerged as an efficient alternative to conventional ray-tracing for radio map estimation.<n>We present the RadioGen3D framework to capture essential 3D signal propagation characteristics and antenna polarization effects.
arXiv Detail & Related papers (2026-02-21T07:50:05Z) - Infinite-World: Scaling Interactive World Models to 1000-Frame Horizons via Pose-Free Hierarchical Memory [101.2076718776139]
We propose a robust interactive world model capable of maintaining coherent visual memory over 1000+ frames in complex real-world environments.<n>We introduce a Pose-free Memory (HPMC) that distills historical latents into a fixed-budget geometric representation.<n>We also propose an Uncertainty-aware Action Labeling module that discretizes continuous motion into a tri-state logic.
arXiv Detail & Related papers (2026-02-02T17:52:56Z) - D3D-VLP: Dynamic 3D Vision-Language-Planning Model for Embodied Grounding and Navigation [66.7166217399105]
Embodied agents face a critical dilemma that end-to-end models lack interpretability and explicit 3D reasoning.<n>Our model introduces two key innovations: 1) A Dynamic 3D Chain-of-Thought (3D CoT) that unifies planning, grounding, navigation, and question answering within a single 3D-VLM and CoT pipeline; 2) A Synergistic Learning from Fragmented Supervision (SLFS) strategy, which uses a masked autoregressive loss to learn from massive and partially-annotated hybrid data.
arXiv Detail & Related papers (2025-12-14T09:53:15Z) - Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method [54.461213497603154]
Occupancy-centric methods have recently achieved state-of-the-art results by offering consistent conditioning across frames and modalities.<n>Nuplan-Occ is the largest occupancy dataset to date, constructed from the widely used Nuplan benchmark.<n>We develop a unified framework that jointly synthesizes high-quality occupancy, multi-view videos, and LiDAR point clouds.
arXiv Detail & Related papers (2025-10-27T03:52:45Z) - Deep Learning-Based Multi-Modal Fusion for Robust Robot Perception and Navigation [1.71849622776539]
This paper introduces a novel deep learning-based multimodal fusion architecture aimed at enhancing the perception capabilities of autonomous navigation robots.<n>By utilizing innovative feature extraction modules, adaptive fusion strategies, and time-series modeling mechanisms, the system effectively integrates RGB images and LiDAR data.
arXiv Detail & Related papers (2025-04-26T19:04:21Z) - Bridging Simulation and Reality: A 3D Clustering-Based Deep Learning Model for UAV-Based RF Source Localization [0.0]
Unmanned aerial vehicles (UAVs) offer significant advantages for RF source localization over terrestrial methods.<n>Recent advancements in deep learning (DL) have further enhanced localization accuracy, particularly for outdoor scenarios.<n>We propose the 3D Cluster-Based RealAdaptRNet, a DL-based method leveraging 3D clustering-based feature extraction for robust localization.
arXiv Detail & Related papers (2025-02-02T05:48:44Z) - A Practical Approach to Underwater Depth and Surface Normals Estimation [3.0516727053033392]
This paper presents a novel deep learning model for Monocular Depth and Surface Normals Estimation (MDSNE)<n>It is specifically tailored for underwater environments, using a hybrid architecture that integrates CNNs with Transformers.<n>Our model reduces parameters by 90% and training costs by 80%, allowing real-time 3D perception on resource-constrained devices.
arXiv Detail & Related papers (2024-10-02T22:41:12Z) - Unleashing the Potential of Mamba: Boosting a LiDAR 3D Sparse Detector by Using Cross-Model Knowledge Distillation [22.653014803666668]
We propose a Faster LiDAR 3D object detection framework, called FASD, which implements heterogeneous model distillation by adaptively uniform cross-model voxel features.
We aim to distill the transformer's capacity for high-performance sequence modeling into Mamba models with low FLOPs, achieving a significant improvement in accuracy through knowledge transfer.
We evaluated the framework on datasets and nuScenes, achieving a 4x reduction in resource consumption and a 1-2% performance improvement over the current SoTA methods.
arXiv Detail & Related papers (2024-09-17T09:30:43Z) - 4D Contrastive Superflows are Dense 3D Representation Learners [62.433137130087445]
We introduce SuperFlow, a novel framework designed to harness consecutive LiDAR-camera pairs for establishing pretraining objectives.
To further boost learning efficiency, we incorporate a plug-and-play view consistency module that enhances alignment of the knowledge distilled from camera views.
arXiv Detail & Related papers (2024-07-08T17:59:54Z) - Scaling Data Generation in Vision-and-Language Navigation [116.95534559103788]
We propose an effective paradigm for generating large-scale data for learning.
We apply 1200+ photo-realistic environments from HM3D and Gibson datasets and synthesizes 4.9 million instruction trajectory pairs.
Thanks to our large-scale dataset, the performance of an existing agent can be pushed up (+11% absolute with regard to previous SoTA) to a significantly new best of 80% single-run success rate on the R2R test split by simple imitation learning.
arXiv Detail & Related papers (2023-07-28T16:03:28Z) - Simple and Effective Synthesis of Indoor 3D Scenes [78.95697556834536]
We study the problem of immersive 3D indoor scenes from one or more images.
Our aim is to generate high-resolution images and videos from novel viewpoints.
We propose an image-to-image GAN that maps directly from reprojections of incomplete point clouds to full high-resolution RGB-D images.
arXiv Detail & Related papers (2022-04-06T17:54:46Z) - Tracking and Planning with Spatial World Models [17.698319441265223]
We introduce a method for real-time navigation and tracking with differentiably rendered world models.
We achieve up to 92% navigation success rate at a frequency of 15 Hz using only image and depth observations.
arXiv Detail & Related papers (2022-01-25T14:16:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.