PillarMamba: Learning Local-Global Context for Roadside Point Cloud via Hybrid State Space Model
- URL: http://arxiv.org/abs/2505.05397v1
- Date: Thu, 08 May 2025 16:33:04 GMT
- Title: PillarMamba: Learning Local-Global Context for Roadside Point Cloud via Hybrid State Space Model
- Authors: Zhang Zhang, Chao Sun, Chao Yue, Da Wen, Tianze Wang, Jianghao Leng,
- Abstract summary: We introduce Mamba to pillar-based roadside point cloud perception.<n>We propose a framework based on Cross-stage State-space Group (CSG), called PillarMamba.<n>The proposed method outperforms the state-of-the-art methods on the popular large scale roadside benchmark: DAIR-V2X-I.
- Score: 6.919896038096772
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Serving the Intelligent Transport System (ITS) and Vehicle-to-Everything (V2X) tasks, roadside perception has received increasing attention in recent years, as it can extend the perception range of connected vehicles and improve traffic safety. However, roadside point cloud oriented 3D object detection has not been effectively explored. To some extent, the key to the performance of a point cloud detector lies in the receptive field of the network and the ability to effectively utilize the scene context. The recent emergence of Mamba, based on State Space Model (SSM), has shaken up the traditional convolution and transformers that have long been the foundational building blocks, due to its efficient global receptive field. In this work, we introduce Mamba to pillar-based roadside point cloud perception and propose a framework based on Cross-stage State-space Group (CSG), called PillarMamba. It enhances the expressiveness of the network and achieves efficient computation through cross-stage feature fusion. However, due to the limitations of scan directions, state space model faces local connection disrupted and historical relationship forgotten. To address this, we propose the Hybrid State-space Block (HSB) to obtain the local-global context of roadside point cloud. Specifically, it enhances neighborhood connections through local convolution and preserves historical memory through residual attention. The proposed method outperforms the state-of-the-art methods on the popular large scale roadside benchmark: DAIR-V2X-I. The code will be released soon.
Related papers
- RoadMamba: A Dual Branch Visual State Space Model for Road Surface Classification [7.33243132385824]
Mamba architecture has shown remarkable performance in visual processing tasks.<n>However, existing Mamba architectures struggle to achieve state-of-the-art visual road surface classification.<n>We propose a method that effectively combines local and global perception, called RoadMamba.<n>The proposed RoadMamba achieves the state-of-the-art performance in experiments on a large-scale road surface classification dataset.
arXiv Detail & Related papers (2025-08-02T05:54:38Z) - MambaMap: Online Vectorized HD Map Construction using State Space Model [11.15033113060733]
MambaMap is a novel framework that efficiently fuses long-range temporal features in the state space to construct online vectorized HD maps.<n>Specifically, MambaMap incorporates a memory bank to store and utilize information from historical frames.<n>In addition, we design innovative multi-directional and spatial-temporal scanning strategies to enhance feature extraction at both BEV and instance levels.
arXiv Detail & Related papers (2025-07-27T11:09:27Z) - TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes [49.43995864524434]
We propose a novel image-to-point cloud registration (I2P) method, TrafficLoc, in a coarse-tofine matching fashion.<n>To overcome the lack of large-scale real-world intersection datasets, we first introduce Carla Intersection, a new simulated dataset with 75 urban and rural intersections in Carla.<n>Our TrafficLoc greatly improves the performance over the SOTA I2P methods (up to 86%) on Carla Intersection and generalizes well to real-world data.
arXiv Detail & Related papers (2024-12-13T17:42:53Z) - Neural Semantic Map-Learning for Autonomous Vehicles [85.8425492858912]
We present a mapping system that fuses local submaps gathered from a fleet of vehicles at a central instance to produce a coherent map of the road environment.
Our method jointly aligns and merges the noisy and incomplete local submaps using a scene-specific Neural Signed Distance Field.
We leverage memory-efficient sparse feature-grids to scale to large areas and introduce a confidence score to model uncertainty in scene reconstruction.
arXiv Detail & Related papers (2024-10-10T10:10:03Z) - CoMamba: Real-time Cooperative Perception Unlocked with State Space Models [39.87600356189242]
CoMamba is a novel cooperative 3D detection framework designed to leverage state-space models for real-time onboard vehicle perception.
CoMamba achieves superior performance compared to existing methods while maintaining real-time processing capabilities.
arXiv Detail & Related papers (2024-09-16T20:02:19Z) - OverlapMamba: Novel Shift State Space Model for LiDAR-based Place Recognition [10.39935021754015]
We develop OverlapMamba, a novel network for place recognition as sequences.
Our method effectively detects loop closures showing even when traversing previously visited locations from different directions.
Relying on raw range view inputs, it outperforms typical LiDAR and multi-view combination methods in time complexity and speed.
arXiv Detail & Related papers (2024-05-13T17:46:35Z) - RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception [8.145851017138618]
We release the first real-world, large-scale RCooper dataset to bloom the research on practical roadside cooperative perception.
The dataset comprises 50k images and 30k point clouds, including two representative traffic scenes.
The constructed benchmarks prove the effectiveness of roadside cooperation perception and demonstrate the direction of further research.
arXiv Detail & Related papers (2024-03-15T09:44:02Z) - Point Cloud Mamba: Point Cloud Learning via State Space Model [73.7454734756626]
We show that Mamba-based point cloud methods can outperform previous methods based on transformer or multi-layer perceptrons (MLPs)
In particular, we demonstrate that Mamba-based point cloud methods can outperform previous methods based on transformer or multi-layer perceptrons (MLPs)
Point Cloud Mamba surpasses the state-of-the-art (SOTA) point-based method PointNeXt and achieves new SOTA performance on the ScanNN, ModelNet40, ShapeNetPart, and S3DIS datasets.
arXiv Detail & Related papers (2024-03-01T18:59:03Z) - MSight: An Edge-Cloud Infrastructure-based Perception System for
Connected Automated Vehicles [58.461077944514564]
This paper presents MSight, a cutting-edge roadside perception system specifically designed for automated vehicles.
MSight offers real-time vehicle detection, localization, tracking, and short-term trajectory prediction.
Evaluations underscore the system's capability to uphold lane-level accuracy with minimal latency.
arXiv Detail & Related papers (2023-10-08T21:32:30Z) - Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view.
Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z) - Road Network Guided Fine-Grained Urban Traffic Flow Inference [108.64631590347352]
Accurate inference of fine-grained traffic flow from coarse-grained one is an emerging yet crucial problem.
We propose a novel Road-Aware Traffic Flow Magnifier (RATFM) that exploits the prior knowledge of road networks.
Our method can generate high-quality fine-grained traffic flow maps.
arXiv Detail & Related papers (2021-09-29T07:51:49Z) - SPIN Road Mapper: Extracting Roads from Aerial Images via Spatial and
Interaction Space Graph Reasoning for Autonomous Driving [64.10636296274168]
Road extraction is an essential step in building autonomous navigation systems.
Using just convolution neural networks (ConvNets) for this problem is not effective as it is inefficient at capturing distant dependencies between road segments in the image.
We propose a Spatial and Interaction Space Graph Reasoning (SPIN) module which when plugged into a ConvNet performs reasoning over graphs constructed on spatial and interaction spaces projected from the feature maps.
arXiv Detail & Related papers (2021-09-16T03:52:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.