Related papers: A Hybrid Autoencoder for Robust Heightmap Generation from Fused Lidar and Depth Data for Humanoid Robot Locomotion

A Hybrid Autoencoder for Robust Heightmap Generation from Fused Lidar and Depth Data for Humanoid Robot Locomotion

URL: http://arxiv.org/abs/2602.05855v1
Date: Thu, 05 Feb 2026 16:38:42 GMT
Title: A Hybrid Autoencoder for Robust Heightmap Generation from Fused Lidar and Depth Data for Humanoid Robot Locomotion
Authors: Dennis Bank, Joost Cordes, Thomas Seel, Simon F. G. Ehlers,
Abstract summary: This paper presents a learning-based framework that uses an intermediate, robot-centric heightmap representation.<n>A hybrid-Decoder Structure (EDS) is introduced, utilizing a Convolutional Neural Network (CNN) for spatial feature extraction.<n>Results demonstrate that multimodal fusion improves reconstruction accuracy by 7.2% over depth-only and 9.9% over LiDAR-only configurations.
Score: 2.9223917785251285
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reliable terrain perception is a critical prerequisite for the deployment of humanoid robots in unstructured, human-centric environments. While traditional systems often rely on manually engineered, single-sensor pipelines, this paper presents a learning-based framework that uses an intermediate, robot-centric heightmap representation. A hybrid Encoder-Decoder Structure (EDS) is introduced, utilizing a Convolutional Neural Network (CNN) for spatial feature extraction fused with a Gated Recurrent Unit (GRU) core for temporal consistency. The architecture integrates multimodal data from an Intel RealSense depth camera, a LIVOX MID-360 LiDAR processed via efficient spherical projection, and an onboard IMU. Quantitative results demonstrate that multimodal fusion improves reconstruction accuracy by 7.2% over depth-only and 9.9% over LiDAR-only configurations. Furthermore, the integration of a 3.2 s temporal context reduces mapping drift.

Related papers

Enhancing Floor Plan Recognition: A Hybrid Mix-Transformer and U-Net Approach for Precise Wall Segmentation [0.0]
This study introduces MitUNet, a hybrid neural network combining a Mix-Transformer encoder and a U-Net decoder.<n>Our approach achieves a balance between precision and recall, ensuring accurate boundary recovery.<n> Experiments on the CubiCasa5k dataset and a proprietary regional dataset demonstrate MitUNet's superiority in generating structurally correct masks.
arXiv Detail & Related papers (2025-12-02T04:47:53Z)
Generative MIMO Beam Map Construction for Location Recovery and Beam Tracking [67.65578956523403]
This paper proposes a generative framework to recover location labels directly from sparse channel state information (CSI) measurements.<n>Instead of directly storing raw CSI, we learn a compact low-dimensional radio map embedding and leverage a generative model to reconstruct the high-dimensional CSI.<n> Numerical experiments demonstrate that the proposed model can improve localization accuracy by over 30% and achieve a 20% capacity gain in non-line-of-sight (NLOS) scenarios.
arXiv Detail & Related papers (2025-11-21T07:25:49Z)
OmniUnet: A Multimodal Network for Unstructured Terrain Segmentation on Planetary Rovers Using RGB, Depth, and Thermal Imagery [0.5837061763460748]
This work presents OmniUnet, a transformer-based neural network architecture for semantic segmentation using RGB, depth, and thermal imagery.<n>A custom multimodal sensor housing was developed using 3D printing and mounted on the Martian Rover Testbed for Autonomy.<n>A subset of this dataset was manually labeled to support supervised training of the network.<n>Inference tests yielded an average prediction time of 673 ms on a resource-constrained computer.
arXiv Detail & Related papers (2025-08-01T12:23:29Z)
DISTA-Net: Dynamic Closely-Spaced Infrared Small Target Unmixing [55.366556355538954]
We propose the Dynamic Iterative Shrinkage Thresholding Network (DISTA-Net), which reconceptualizes traditional sparse reconstruction within a dynamic framework.<n>DISTA-Net is the first deep learning model designed specifically for the unmixing of closely-spaced infrared small targets.<n>We have established the first open-source ecosystem to foster further research in this field.
arXiv Detail & Related papers (2025-05-25T13:52:00Z)
Deep Learning-Based Multi-Modal Fusion for Robust Robot Perception and Navigation [1.71849622776539]
This paper introduces a novel deep learning-based multimodal fusion architecture aimed at enhancing the perception capabilities of autonomous navigation robots.<n>By utilizing innovative feature extraction modules, adaptive fusion strategies, and time-series modeling mechanisms, the system effectively integrates RGB images and LiDAR data.
arXiv Detail & Related papers (2025-04-26T19:04:21Z)
UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation [113.35352122662752]
We present an efficient multi-modal backbone for outdoor 3D perception named UniTR. UniTR processes a variety of modalities with unified modeling and shared parameters. UniTR is also a fundamentally task-agnostic backbone that naturally supports different 3D perception tasks.
arXiv Detail & Related papers (2023-08-15T12:13:44Z)
FRAME: Fast and Robust Autonomous 3D point cloud Map-merging for Egocentric multi-robot exploration [2.433860819518925]
This article presents a 3D point cloud map-merging framework for egocentric heterogeneous multi-robot exploration. The novel proposed solution utilizes state-of-the-art place recognition learned descriptors, that through the framework's main pipeline, offer a fast and robust region overlap estimation. The efficacy of the proposed framework is experimentally evaluated based on multiple field multi-robot exploration missions in underground environments.
arXiv Detail & Related papers (2023-01-22T21:59:38Z)
NAF: Neural Attenuation Fields for Sparse-View CBCT Reconstruction [79.13750275141139]
This paper proposes a novel and fast self-supervised solution for sparse-view CBCT reconstruction. The desired attenuation coefficients are represented as a continuous function of 3D spatial coordinates, parameterized by a fully-connected deep neural network. A learning-based encoder entailing hash coding is adopted to help the network capture high-frequency details.
arXiv Detail & Related papers (2022-09-29T04:06:00Z)
Real-Time Monocular Human Depth Estimation and Segmentation on Embedded Systems [13.490605853268837]
Estimating a scene's depth to achieve collision avoidance against moving pedestrians is a crucial and fundamental problem in the robotic field. This paper proposes a novel, low complexity network architecture for fast and accurate human depth estimation and segmentation in indoor environments.
arXiv Detail & Related papers (2021-08-24T03:26:08Z)
Kimera-Multi: Robust, Distributed, Dense Metric-Semantic SLAM for Multi-Robot Systems [92.26462290867963]
Kimera-Multi is the first multi-robot system that is robust and capable of identifying and rejecting incorrect inter and intra-robot loop closures. We demonstrate Kimera-Multi in photo-realistic simulations, SLAM benchmarking datasets, and challenging outdoor datasets collected using ground robots.
arXiv Detail & Related papers (2021-06-28T03:56:40Z)
Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video. Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer. To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.