MambaOcc: Visual State Space Model for BEV-based Occupancy Prediction with Local Adaptive Reordering
- URL: http://arxiv.org/abs/2408.11464v1
- Date: Wed, 21 Aug 2024 09:29:45 GMT
- Title: MambaOcc: Visual State Space Model for BEV-based Occupancy Prediction with Local Adaptive Reordering
- Authors: Yonglin Tian, Songlin Bai, Zhiyao Luo, Yutong Wang, Yisheng Lv, Fei-Yue Wang,
- Abstract summary: We propose a Mamba-based occupancy prediction method (MambaOcc) adopting BEV features to ease the burden of 3D scenario representation.
Experiments on the Occ3D-nuScenes dataset demonstrate that MambaOcc achieves state-of-the-art performance in terms of both accuracy and computational efficiency.
- Score: 31.239405105904574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Occupancy prediction has attracted intensive attention and shown great superiority in the development of autonomous driving systems. The fine-grained environmental representation brought by occupancy prediction in terms of both geometry and semantic information has facilitated the general perception and safe planning under open scenarios. However, it also brings high computation costs and heavy parameters in existing works that utilize voxel-based 3d dense representation and Transformer-based quadratic attention. To address these challenges, in this paper, we propose a Mamba-based occupancy prediction method (MambaOcc) adopting BEV features to ease the burden of 3D scenario representation, and linear Mamba-style attention to achieve efficient long-range perception. Besides, to address the sensitivity of Mamba to sequence order, we propose a local adaptive reordering (LAR) mechanism with deformable convolution and design a hybrid BEV encoder comprised of convolution layers and Mamba. Extensive experiments on the Occ3D-nuScenes dataset demonstrate that MambaOcc achieves state-of-the-art performance in terms of both accuracy and computational efficiency. For example, compared to FlashOcc, MambaOcc delivers superior results while reducing the number of parameters by 42\% and computational costs by 39\%. Code will be available at https://github.com/Hub-Tian/MambaOcc.
Related papers
- HRVMamba: High-Resolution Visual State Space Model for Dense Prediction [60.80423207808076]
State Space Models (SSMs) with efficient hardware-aware designs have demonstrated significant potential in computer vision tasks.
These models have been constrained by three key challenges: insufficient inductive bias, long-range forgetting, and low-resolution output representation.
We introduce the Dynamic Visual State Space (DVSS) block, which employs deformable convolution to mitigate the long-range forgetting problem.
We also introduce High-Resolution Visual State Space Model (HRVMamba) based on the DVSS block, which preserves high-resolution representations throughout the entire process.
arXiv Detail & Related papers (2024-10-04T06:19:29Z) - OPUS: Occupancy Prediction Using a Sparse Set [64.60854562502523]
We present a framework to simultaneously predict occupied locations and classes using a set of learnable queries.
OPUS incorporates a suite of non-trivial strategies to enhance model performance.
Our lightest model achieves superior RayIoU on the Occ3D-nuScenes dataset at near 2x FPS, while our heaviest model surpasses previous best results by 6.1 RayIoU.
arXiv Detail & Related papers (2024-09-14T07:44:22Z) - Mamba Policy: Towards Efficient 3D Diffusion Policy with Hybrid Selective State Models [20.956716048789474]
Mamba model has emerged as a promising solution for efficient modeling.
We propose the Mamba Policy, which reduces the parameter count by over 80% compared to the original policy network.
Extensive experiments demonstrate that the Mamba Policy excels on the Adroit, Dexart, and MetaWorld datasets.
arXiv Detail & Related papers (2024-09-11T10:21:21Z) - UNetMamba: An Efficient UNet-Like Mamba for Semantic Segmentation of High-Resolution Remote Sensing Images [4.9571046933387395]
UNetMamba is a UNet-like semantic segmentation model based on Mamba.
Experiments demonstrate that UNetMamba outperforms the state-of-the-art methods with mIoU increased by 0.87% on LoveDA and 0.39% on ISPRS Vaihingen.
arXiv Detail & Related papers (2024-08-21T11:53:53Z) - Bidirectional Gated Mamba for Sequential Recommendation [56.85338055215429]
Mamba, a recent advancement, has exhibited exceptional performance in time series prediction.
We introduce a new framework named Selective Gated Mamba ( SIGMA) for Sequential Recommendation.
Our results indicate that SIGMA outperforms current models on five real-world datasets.
arXiv Detail & Related papers (2024-08-21T09:12:59Z) - OccMamba: Semantic Occupancy Prediction with State Space Models [16.646162677831985]
We present the first Mamba-based network for semantic occupancy prediction, termed OccMamba.
We present a simple yet effective 3D-to-1D reordering operation, i.e., height-prioritized 2D Hilbert expansion.
OccMamba achieves state-of-the-art performance on three prevalent occupancy prediction benchmarks.
arXiv Detail & Related papers (2024-08-19T10:07:00Z) - Is Mamba Effective for Time Series Forecasting? [30.85990093479062]
We propose a Mamba-based model named Simple-Mamba (S-Mamba) for time series forecasting.
Specifically, we tokenize the time points of each variate autonomously via a linear layer.
Experiments on thirteen public datasets prove that S-Mamba maintains low computational overhead and achieves leading performance.
arXiv Detail & Related papers (2024-03-17T08:50:44Z) - MiM-ISTD: Mamba-in-Mamba for Efficient Infrared Small Target Detection [72.46396769642787]
We develop a nested structure, Mamba-in-Mamba (MiM-ISTD), for efficient infrared small target detection.
MiM-ISTD is $8 times$ faster than the SOTA method and reduces GPU memory usage by 62.2$%$ when testing on $2048 times 2048$ images.
arXiv Detail & Related papers (2024-03-04T15:57:29Z) - PointMamba: A Simple State Space Model for Point Cloud Analysis [65.59944745840866]
We propose PointMamba, transferring the success of Mamba, a recent representative state space model (SSM), from NLP to point cloud analysis tasks.
Unlike traditional Transformers, PointMamba employs a linear complexity algorithm, presenting global modeling capacity while significantly reducing computational costs.
arXiv Detail & Related papers (2024-02-16T14:56:13Z) - FlashOcc: Fast and Memory-Efficient Occupancy Prediction via
Channel-to-Height Plugin [32.172269679513285]
FlashOCC consolidates rapid and memory-efficient occupancy prediction.
Channel-to-height transformation is introduced to lift the output logits from the BEV into the 3D space.
Results substantiate the superiority of our plug-and-play paradigm over previous state-of-the-art methods.
arXiv Detail & Related papers (2023-11-18T15:28:09Z) - OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic
Occupancy Perception [73.05425657479704]
We propose OpenOccupancy, which is the first surrounding semantic occupancy perception benchmark.
We extend the large-scale nuScenes dataset with dense semantic occupancy annotations.
Considering the complexity of surrounding occupancy perception, we propose the Cascade Occupancy Network (CONet) to refine the coarse prediction.
arXiv Detail & Related papers (2023-03-07T15:43:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.