Related papers: RoadMamba: A Dual Branch Visual State Space Model for Road Surface Classification

RoadMamba: A Dual Branch Visual State Space Model for Road Surface Classification

URL: http://arxiv.org/abs/2508.01210v1
Date: Sat, 02 Aug 2025 05:54:38 GMT
Title: RoadMamba: A Dual Branch Visual State Space Model for Road Surface Classification
Authors: Tianze Wang, Zhang Zhang, Chao Yue, Nuoran Li, Chao Sun,
Abstract summary: Mamba architecture has shown remarkable performance in visual processing tasks.<n>However, existing Mamba architectures struggle to achieve state-of-the-art visual road surface classification.<n>We propose a method that effectively combines local and global perception, called RoadMamba.<n>The proposed RoadMamba achieves the state-of-the-art performance in experiments on a large-scale road surface classification dataset.
Score: 7.33243132385824
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Acquiring the road surface conditions in advance based on visual technologies provides effective information for the planning and control system of autonomous vehicles, thus improving the safety and driving comfort of the vehicles. Recently, the Mamba architecture based on state-space models has shown remarkable performance in visual processing tasks, benefiting from the efficient global receptive field. However, existing Mamba architectures struggle to achieve state-of-the-art visual road surface classification due to their lack of effective extraction of the local texture of the road surface. In this paper, we explore for the first time the potential of visual Mamba architectures for road surface classification task and propose a method that effectively combines local and global perception, called RoadMamba. Specifically, we utilize the Dual State Space Model (DualSSM) to effectively extract the global semantics and local texture of the road surface and decode and fuse the dual features through the Dual Attention Fusion (DAF). In addition, we propose a dual auxiliary loss to explicitly constrain dual branches, preventing the network from relying only on global semantic information from the deep large receptive field and ignoring the local texture. The proposed RoadMamba achieves the state-of-the-art performance in experiments on a large-scale road surface classification dataset containing 1 million samples.

Related papers

Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation [54.04601077224252]
Embodied scene understanding requires not only comprehending visual-spatial information but also determining where to explore next in the 3D physical world.<n>underlinetextbf3D vision-language learning enables embodied agents to effectively explore and understand their environment.<n>model's versatility enables navigation using diverse input modalities, including categories, language descriptions, and reference images.
arXiv Detail & Related papers (2025-07-05T14:15:52Z)
RoadFormer : Local-Global Feature Fusion for Road Surface Classification in Autonomous Driving [7.3210301283888315]
The classification of the type of road surface (RSC) aims to utilize pavement features to identify the roughness, wet and dry conditions, and material information of the road surface.<n>In autonomous driving, accurate RSC allows vehicles to better understand the road environment, adjust driving strategies, and ensure a safer and more efficient driving experience.<n>We propose a vision-based fine-grained RSC method for autonomous driving scenarios, which fuses local and global feature information through the stacking of convolutional and transformer modules.
arXiv Detail & Related papers (2025-06-03T01:23:19Z)
PillarMamba: Learning Local-Global Context for Roadside Point Cloud via Hybrid State Space Model [6.919896038096772]
We introduce Mamba to pillar-based roadside point cloud perception.<n>We propose a framework based on Cross-stage State-space Group (CSG), called PillarMamba.<n>The proposed method outperforms the state-of-the-art methods on the popular large scale roadside benchmark: DAIR-V2X-I.
arXiv Detail & Related papers (2025-05-08T16:33:04Z)
TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba [88.31117598044725]
We explore cross-architecture training to transfer the ready knowledge in existing Transformer models to alternative architecture Mamba, termed TransMamba.<n>Our approach employs a two-stage strategy to expedite training new Mamba models, ensuring effectiveness in across uni-modal and cross-modal tasks.<n>For cross-modal learning, we propose a cross-Mamba module that integrates language awareness into Mamba's visual features, enhancing the cross-modal interaction capabilities of Mamba architecture.
arXiv Detail & Related papers (2025-02-21T01:22:01Z)
Neural Semantic Map-Learning for Autonomous Vehicles [85.8425492858912]
We present a mapping system that fuses local submaps gathered from a fleet of vehicles at a central instance to produce a coherent map of the road environment. Our method jointly aligns and merges the noisy and incomplete local submaps using a scene-specific Neural Signed Distance Field. We leverage memory-efficient sparse feature-grids to scale to large areas and introduce a confidence score to model uncertainty in scene reconstruction.
arXiv Detail & Related papers (2024-10-10T10:10:03Z)
MambaUIE&SR: Unraveling the Ocean's Secrets with Only 2.8 GFLOPs [1.7648680700685022]
Underwater Image Enhancement (UIE) techniques aim to address the problem of underwater image degradation due to light absorption and scattering. Recent years, both Convolution Neural Network (CNN)-based and Transformer-based methods have been widely explored. MambaUIE is able to efficiently synthesize global and local information and maintains a very small number of parameters with high accuracy.
arXiv Detail & Related papers (2024-04-22T05:12:11Z)
OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping [84.65114565766596]
We present OpenLane-V2, the first dataset on topology reasoning for traffic scene structure. OpenLane-V2 consists of 2,000 annotated road scenes that describe traffic elements and their correlation to the lanes. We evaluate various state-of-the-art methods, and present their quantitative and qualitative results on OpenLane-V2 to indicate future avenues for investigating topology reasoning in traffic scenes.
arXiv Detail & Related papers (2023-04-20T16:31:22Z)
Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view. Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z)
Dynamic loss balancing and sequential enhancement for road-safety assessment and traffic scene classification [0.0]
Road-safety inspection is an indispensable instrument for reducing road-accident fatalities contributed to road infrastructure. Recent work formalizes road-safety assessment in terms of carefully selected risk factors that are also known as road-safety attributes. We propose to reduce dependency on tedious human labor by automating recognition with a two-stage neural architecture.
arXiv Detail & Related papers (2022-11-08T11:10:07Z)
Aerial Images Meet Crowdsourced Trajectories: A New Approach to Robust Road Extraction [110.61383502442598]
We introduce a novel neural network framework termed Cross-Modal Message Propagation Network (CMMPNet) CMMPNet is composed of two deep Auto-Encoders for modality-specific representation learning and a tailor-designed Dual Enhancement Module for cross-modal representation refinement. Experiments on three real-world benchmarks demonstrate the effectiveness of our CMMPNet for robust road extraction.
arXiv Detail & Related papers (2021-11-30T04:30:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.