Related papers: Scene-Agnostic Traversability Labeling and Estimation via a Multimodal Self-supervised Framework

Scene-Agnostic Traversability Labeling and Estimation via a Multimodal Self-supervised Framework

URL: http://arxiv.org/abs/2508.18249v1
Date: Mon, 25 Aug 2025 17:40:16 GMT
Title: Scene-Agnostic Traversability Labeling and Estimation via a Multimodal Self-supervised Framework
Authors: Zipeng Fang, Yanbo Wang, Lei Zhao, Weidong Chen,
Abstract summary: Traversability estimation is critical for enabling robots to navigate across diverse terrains and environments.<n>We propose a multimodal self-supervised framework for traversability labeling and estimation.<n>Our approach consistently achieves around 88% IoU across diverse datasets.
Score: 9.925474085627275
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Traversability estimation is critical for enabling robots to navigate across diverse terrains and environments. While recent self-supervised learning methods achieve promising results, they often fail to capture the characteristics of non-traversable regions. Moreover, most prior works concentrate on a single modality, overlooking the complementary strengths offered by integrating heterogeneous sensory modalities for more robust traversability estimation. To address these limitations, we propose a multimodal self-supervised framework for traversability labeling and estimation. First, our annotation pipeline integrates footprint, LiDAR, and camera data as prompts for a vision foundation model, generating traversability labels that account for both semantic and geometric cues. Then, leveraging these labels, we train a dual-stream network that jointly learns from different modalities in a decoupled manner, enhancing its capacity to recognize diverse traversability patterns. In addition, we incorporate sparse LiDAR-based supervision to mitigate the noise introduced by pseudo labels. Finally, extensive experiments conducted across urban, off-road, and campus environments demonstrate the effectiveness of our approach. The proposed automatic labeling method consistently achieves around 88% IoU across diverse datasets. Compared to existing self-supervised state-of-the-art methods, our multimodal traversability estimation network yields consistently higher IoU, improving by 1.6-3.5% on all evaluated datasets.

Related papers

Toward Stable Semi-Supervised Remote Sensing Segmentation via Co-Guidance and Co-Fusion [31.189038928192648]
Co2S is a semi-supervised RS segmentation framework that fuses priors from vision-language models and self-supervised models.<n>An explicit-implicit semantic co-guidance mechanism is introduced that utilizes text embeddings and learnable queries.<n>Experiments on six popular datasets demonstrate the superiority of the proposed method.
arXiv Detail & Related papers (2025-12-28T18:24:19Z)
Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model [62.889356203346985]
We propose DUal-STream diffusion (DUST), a world-model augmented VLA framework that handles the modality conflict.<n>DUST achieves up to 6% gains over a standard VLA baseline and implicit world-modeling methods.<n>On real-world tasks with the Franka Research 3, DUST outperforms baselines in success rate by 13%.
arXiv Detail & Related papers (2025-10-31T16:32:12Z)
Stochastic Encodings for Active Feature Acquisition [100.47043816019888]
Active Feature Acquisition is an instance-wise, sequential decision making problem.<n>The aim is to dynamically select which feature to measure based on current observations, independently for each test instance.<n>Common approaches either use Reinforcement Learning, which experiences training difficulties, or greedily maximize the conditional mutual information of the label and unobserved features, which makes myopic.<n>We introduce a latent variable model, trained in a supervised manner. Acquisitions are made by reasoning about the features across many possible unobserved realizations in a latent space.
arXiv Detail & Related papers (2025-08-03T23:48:46Z)
MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings [75.0617088717528]
MoCa is a framework for transforming pre-trained VLM backbones into effective bidirectional embedding models.<n>MoCa consistently improves performance across MMEB and ViDoRe-v2 benchmarks, achieving new state-of-the-art results.
arXiv Detail & Related papers (2025-06-29T06:41:00Z)
SkillMimic-V2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations [68.9300049150948]
We address a fundamental challenge in Reinforcement Learning from Interaction Demonstration (RLID)<n>Existing data collection approaches yield sparse, disconnected, and noisy trajectories that fail to capture the full spectrum of possible skill variations and transitions.<n>We present two data augmentation techniques: a Stitched Trajectory Graph (STG) that discovers potential transitions between demonstration skills, and a State Transition Field (STF) that establishes unique connections for arbitrary states within the demonstration neighborhood.
arXiv Detail & Related papers (2025-05-04T13:00:29Z)
Without Paired Labeled Data: End-to-End Self-Supervised Learning for Drone-view Geo-Localization [2.733505168507872]
Drone-view Geo-Localization (DVGL) aims to achieve accurate localization of drones by retrieving the most relevant GPS-tagged satellite images.<n>Existing methods heavily rely on strictly pre-paired drone-satellite images for supervised learning.<n>We propose an end-to-end self-supervised learning method with a shallow backbone network.
arXiv Detail & Related papers (2025-02-17T02:53:08Z)
Distribution Discrepancy and Feature Heterogeneity for Active 3D Object Detection [18.285299184361598]
LiDAR-based 3D object detection is a critical technology for the development of autonomous driving and robotics. We propose a novel and effective active learning (AL) method called Distribution Discrepancy and Feature Heterogeneity (DDFH) It simultaneously considers geometric features and model embeddings, assessing information from both the instance-level and frame-level perspectives.
arXiv Detail & Related papers (2024-09-09T08:26:11Z)
Stochastic Vision Transformers with Wasserstein Distance-Aware Attention [8.407731308079025]
Self-supervised learning is one of the most promising approaches to acquiring knowledge from limited labeled data. We introduce a new vision transformer that integrates uncertainty and distance awareness into self-supervised learning pipelines. Our proposed method achieves superior accuracy and calibration, surpassing the self-supervised baseline in a wide range of experiments on a variety of datasets.
arXiv Detail & Related papers (2023-11-30T15:53:37Z)
Fusing Pseudo Labels with Weak Supervision for Dynamic Traffic Scenarios [0.0]
We introduce a weakly-supervised label unification pipeline that amalgamates pseudo labels from object detection models trained on heterogeneous datasets. Our pipeline engenders a unified label space through the amalgamation of labels from disparate datasets, rectifying bias and enhancing generalization. We retrain a solitary object detection model using the merged label space, culminating in a resilient model proficient in dynamic traffic scenarios.
arXiv Detail & Related papers (2023-08-30T11:33:07Z)
Unsupervised Self-Driving Attention Prediction via Uncertainty Mining and Knowledge Embedding [51.8579160500354]
We propose an unsupervised way to predict self-driving attention by uncertainty modeling and driving knowledge integration. Results show equivalent or even more impressive performance compared to fully-supervised state-of-the-art approaches.
arXiv Detail & Related papers (2023-03-17T00:28:33Z)
Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction [71.97877759413272]
Trajectory prediction is a safety-critical tool for autonomous vehicles to plan and execute actions. Recent methods have achieved strong performances using Multi-Choice Learning objectives like winner-takes-all (WTA) or best-of-many. Our work addresses two key challenges in trajectory prediction, learning outputs, and better predictions by imposing constraints using driving knowledge.
arXiv Detail & Related papers (2021-04-16T17:58:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.