Related papers: World Models for Autonomous Navigation of Terrestrial Robots from LIDAR Observations

World Models for Autonomous Navigation of Terrestrial Robots from LIDAR Observations

URL: http://arxiv.org/abs/2512.03429v1
Date: Wed, 03 Dec 2025 04:15:31 GMT
Title: World Models for Autonomous Navigation of Terrestrial Robots from LIDAR Observations
Authors: Raul Steinmetz, Fabio Demo Rosa, Victor Augusto Kich, Jair Augusto Bottega, Ricardo Bedin Grando, Daniel Fernando Tello Gamarra,
Abstract summary: This paper presents a novel model-based RL framework built on top of the DreamerV3 algorithm.<n>It integrates a Multi-Layer Perceptron Variational Autoencoder (MLP-VAE) within a world model to encode high-dimensional LIDAR readings into compact latent representations.<n>Experiments on simulated TurtleBot3 navigation tasks demonstrate that the proposed architecture achieves faster convergence and higher success rate.
Score: 0.7239024032079358
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Autonomous navigation of terrestrial robots using Reinforcement Learning (RL) from LIDAR observations remains challenging due to the high dimensionality of sensor data and the sample inefficiency of model-free approaches. Conventional policy networks struggle to process full-resolution LIDAR inputs, forcing prior works to rely on simplified observations that reduce spatial awareness and navigation robustness. This paper presents a novel model-based RL framework built on top of the DreamerV3 algorithm, integrating a Multi-Layer Perceptron Variational Autoencoder (MLP-VAE) within a world model to encode high-dimensional LIDAR readings into compact latent representations. These latent features, combined with a learned dynamics predictor, enable efficient imagination-based policy optimization. Experiments on simulated TurtleBot3 navigation tasks demonstrate that the proposed architecture achieves faster convergence and higher success rate compared to model-free baselines such as SAC, DDPG, and TD3. It is worth emphasizing that the DreamerV3-based agent attains a 100% success rate across all evaluated environments when using the full dataset of the Turtlebot3 LIDAR (360 readings), while model-free methods plateaued below 85%. These findings demonstrate that integrating predictive world models with learned latent representations enables more efficient and robust navigation from high-dimensional sensory data.

Related papers

RadioGen3D: 3D Radio Map Generation via Adversarial Learning on Large-Scale Synthetic Data [62.63849426834315]
Radio maps are essential for efficient radio resource management in future 6G and low-altitude networks.<n>Deep learning (DL) techniques have emerged as an efficient alternative to conventional ray-tracing for radio map estimation.<n>We present the RadioGen3D framework to capture essential 3D signal propagation characteristics and antenna polarization effects.
arXiv Detail & Related papers (2026-02-21T07:50:05Z)
Infinite-World: Scaling Interactive World Models to 1000-Frame Horizons via Pose-Free Hierarchical Memory [101.2076718776139]
We propose a robust interactive world model capable of maintaining coherent visual memory over 1000+ frames in complex real-world environments.<n>We introduce a Pose-free Memory (HPMC) that distills historical latents into a fixed-budget geometric representation.<n>We also propose an Uncertainty-aware Action Labeling module that discretizes continuous motion into a tri-state logic.
arXiv Detail & Related papers (2026-02-02T17:52:56Z)
D3D-VLP: Dynamic 3D Vision-Language-Planning Model for Embodied Grounding and Navigation [66.7166217399105]
Embodied agents face a critical dilemma that end-to-end models lack interpretability and explicit 3D reasoning.<n>Our model introduces two key innovations: 1) A Dynamic 3D Chain-of-Thought (3D CoT) that unifies planning, grounding, navigation, and question answering within a single 3D-VLM and CoT pipeline; 2) A Synergistic Learning from Fragmented Supervision (SLFS) strategy, which uses a masked autoregressive loss to learn from massive and partially-annotated hybrid data.
arXiv Detail & Related papers (2025-12-14T09:53:15Z)
Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method [54.461213497603154]
Occupancy-centric methods have recently achieved state-of-the-art results by offering consistent conditioning across frames and modalities.<n>Nuplan-Occ is the largest occupancy dataset to date, constructed from the widely used Nuplan benchmark.<n>We develop a unified framework that jointly synthesizes high-quality occupancy, multi-view videos, and LiDAR point clouds.
arXiv Detail & Related papers (2025-10-27T03:52:45Z)
Deep Learning-Based Multi-Modal Fusion for Robust Robot Perception and Navigation [1.71849622776539]
This paper introduces a novel deep learning-based multimodal fusion architecture aimed at enhancing the perception capabilities of autonomous navigation robots.<n>By utilizing innovative feature extraction modules, adaptive fusion strategies, and time-series modeling mechanisms, the system effectively integrates RGB images and LiDAR data.
arXiv Detail & Related papers (2025-04-26T19:04:21Z)
Bridging Simulation and Reality: A 3D Clustering-Based Deep Learning Model for UAV-Based RF Source Localization [0.0]
Unmanned aerial vehicles (UAVs) offer significant advantages for RF source localization over terrestrial methods.<n>Recent advancements in deep learning (DL) have further enhanced localization accuracy, particularly for outdoor scenarios.<n>We propose the 3D Cluster-Based RealAdaptRNet, a DL-based method leveraging 3D clustering-based feature extraction for robust localization.
arXiv Detail & Related papers (2025-02-02T05:48:44Z)
A Practical Approach to Underwater Depth and Surface Normals Estimation [3.0516727053033392]
This paper presents a novel deep learning model for Monocular Depth and Surface Normals Estimation (MDSNE)<n>It is specifically tailored for underwater environments, using a hybrid architecture that integrates CNNs with Transformers.<n>Our model reduces parameters by 90% and training costs by 80%, allowing real-time 3D perception on resource-constrained devices.
arXiv Detail & Related papers (2024-10-02T22:41:12Z)
Unleashing the Potential of Mamba: Boosting a LiDAR 3D Sparse Detector by Using Cross-Model Knowledge Distillation [22.653014803666668]
We propose a Faster LiDAR 3D object detection framework, called FASD, which implements heterogeneous model distillation by adaptively uniform cross-model voxel features. We aim to distill the transformer's capacity for high-performance sequence modeling into Mamba models with low FLOPs, achieving a significant improvement in accuracy through knowledge transfer. We evaluated the framework on datasets and nuScenes, achieving a 4x reduction in resource consumption and a 1-2% performance improvement over the current SoTA methods.
arXiv Detail & Related papers (2024-09-17T09:30:43Z)
4D Contrastive Superflows are Dense 3D Representation Learners [62.433137130087445]
We introduce SuperFlow, a novel framework designed to harness consecutive LiDAR-camera pairs for establishing pretraining objectives. To further boost learning efficiency, we incorporate a plug-and-play view consistency module that enhances alignment of the knowledge distilled from camera views.
arXiv Detail & Related papers (2024-07-08T17:59:54Z)
Scaling Data Generation in Vision-and-Language Navigation [116.95534559103788]
We propose an effective paradigm for generating large-scale data for learning. We apply 1200+ photo-realistic environments from HM3D and Gibson datasets and synthesizes 4.9 million instruction trajectory pairs. Thanks to our large-scale dataset, the performance of an existing agent can be pushed up (+11% absolute with regard to previous SoTA) to a significantly new best of 80% single-run success rate on the R2R test split by simple imitation learning.
arXiv Detail & Related papers (2023-07-28T16:03:28Z)
Simple and Effective Synthesis of Indoor 3D Scenes [78.95697556834536]
We study the problem of immersive 3D indoor scenes from one or more images. Our aim is to generate high-resolution images and videos from novel viewpoints. We propose an image-to-image GAN that maps directly from reprojections of incomplete point clouds to full high-resolution RGB-D images.
arXiv Detail & Related papers (2022-04-06T17:54:46Z)
Tracking and Planning with Spatial World Models [17.698319441265223]
We introduce a method for real-time navigation and tracking with differentiably rendered world models. We achieve up to 92% navigation success rate at a frequency of 15 Hz using only image and depth observations.
arXiv Detail & Related papers (2022-01-25T14:16:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.