WinMamba: Multi-Scale Shifted Windows in State Space Model for 3D Object Detection
- URL: http://arxiv.org/abs/2511.13138v1
- Date: Mon, 17 Nov 2025 08:46:54 GMT
- Title: WinMamba: Multi-Scale Shifted Windows in State Space Model for 3D Object Detection
- Authors: Longhui Zheng, Qiming Xia, Xiaolu Chen, Zhaoliang Liu, Chenglu Wen,
- Abstract summary: WinMamba is a novel Mamba-based 3D feature-encoding backbone composed of stacked WinMamba blocks.<n>To enhance the backbone with robust multi-scale representation, the WinMamba block incorporates a window-scale-adaptive module.<n>Experiments on the KITTI and datasets demonstrate that WinMamba significantly outperforms the baseline.
- Score: 22.498942151484624
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D object detection is critical for autonomous driving, yet it remains fundamentally challenging to simultaneously maximize computational efficiency and capture long-range spatial dependencies. We observed that Mamba-based models, with their linear state-space design, capture long-range dependencies at lower cost, offering a promising balance between efficiency and accuracy. However, existing methods rely on axis-aligned scanning within a fixed window, inevitably discarding spatial information. To address this problem, we propose WinMamba, a novel Mamba-based 3D feature-encoding backbone composed of stacked WinMamba blocks. To enhance the backbone with robust multi-scale representation, the WinMamba block incorporates a window-scale-adaptive module that compensates voxel features across varying resolutions during sampling. Meanwhile, to obtain rich contextual cues within the linear state space, we equip the WinMamba layer with a learnable positional encoding and a window-shift strategy. Extensive experiments on the KITTI and Waymo datasets demonstrate that WinMamba significantly outperforms the baseline. Ablation studies further validate the individual contributions of the WSF and AWF modules in improving detection accuracy. The code will be made publicly available.
Related papers
- Fore-Mamba3D: Mamba-based Foreground-Enhanced Encoding for 3D Object Detection [16.398581898787608]
We propose a novel backbone, Fore-Mamba3D, to focus on the foreground enhancement by modifying Mamba-based encoder.<n>Considering the response attenuation existing in the interaction of foreground voxels across different instances, we design a regional-to-global slide window.<n>Our method emphasizes foreground-only encoding and alleviates the distance-based and causal dependencies in the linear autore model.
arXiv Detail & Related papers (2026-02-23T06:03:07Z) - Samba+: General and Accurate Salient Object Detection via A More Unified Mamba-based Framework [66.2103745798444]
Saliency Mamba (Samba) is a pure Mamba-based architecture that flexibly handles various distinct salient object detection tasks.<n>Samba individually outperforms existing methods across six SOD tasks on 22 datasets with lower computational cost.<n>Samba+ achieves even superior results on these tasks and datasets by using a single trained versatile model.
arXiv Detail & Related papers (2026-02-02T03:34:25Z) - SfMamba: Efficient Source-Free Domain Adaptation via Selective Scan Modeling [60.860172819390954]
Source-free domain adaptation (SFDA) tackles the challenge of adapting source-pretrained models to unlabeled target domains.<n>We propose a framework called SfMamba to fully explore the stable dependency in source-free model transfer.
arXiv Detail & Related papers (2026-01-13T14:53:47Z) - AtrousMamaba: An Atrous-Window Scanning Visual State Space Model for Remote Sensing Change Detection [29.004019252136565]
We propose a novel model, AtrousMamba, which balances the extraction of fine-grained local details with the integration of global contextual information.<n>By leveraging the atrous window scan visual state space (AWVSS) module, we design dedicated end-to-end Mamba-based frameworks for binary change detection (BCD) and semantic change detection (SCD)<n> Experimental results on six benchmark datasets show that the proposed framework outperforms existing CNN-based, Transformer-based, and Mamba-based methods.
arXiv Detail & Related papers (2025-07-22T02:36:16Z) - ConMamba: Contrastive Vision Mamba for Plant Disease Detection [3.60543005189868]
Plant Disease Detection (PDD) is a key aspect of precision agriculture.<n>Existing deep learning methods often rely on extensively annotated datasets.<n>We propose ConMamba, a novel framework specially designed for PDD.
arXiv Detail & Related papers (2025-06-03T03:01:38Z) - UniMamba: Unified Spatial-Channel Representation Learning with Group-Efficient Mamba for LiDAR-based 3D Object Detection [53.785766442201094]
Recent advances in LiDAR 3D detection have demonstrated the effectiveness of Transformer-based frameworks in capturing the global dependencies from point cloud spaces.<n>Due to the considerable number of 3D voxels and quadratic complexity of Transformers, multiple sequences are grouped before feeding to Transformers, leading to a limited receptive field.<n>Inspired by the impressive performance of State Space Models (SSM) achieved in the field of 2D vision tasks, we propose a novel Unified Mamba (UniMamba)<n>Specifically, a UniMamba block is designed which mainly consists of locality modeling, Z-order serialization and local-global sequential aggregator.
arXiv Detail & Related papers (2025-03-15T06:22:31Z) - STNMamba: Mamba-based Spatial-Temporal Normality Learning for Video Anomaly Detection [48.997518615379995]
Video anomaly detection (VAD) has been extensively researched due to its potential for intelligent video systems.<n>Most existing methods based on CNNs and transformers still suffer from substantial computational burdens.<n>We propose a lightweight and effective Mamba-based network named STNMamba to enhance the learning of spatial-temporal normality.
arXiv Detail & Related papers (2024-12-28T08:49:23Z) - Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement [54.427965535613886]
Mamba, as a novel state-space model (SSM), has gained widespread application in natural language processing and computer vision.<n>In this work, we introduce Mamba-SEUNet, an innovative architecture that integrates Mamba with U-Net for SE tasks.
arXiv Detail & Related papers (2024-12-21T13:43:51Z) - MobileMamba: Lightweight Multi-Receptive Visual Mamba Network [51.33486891724516]
Previous research on lightweight models has primarily focused on CNNs and Transformer-based designs.
We propose the MobileMamba framework, which balances efficiency and performance.
MobileMamba achieves up to 83.6% on Top-1, surpassing existing state-of-the-art methods.
arXiv Detail & Related papers (2024-11-24T18:01:05Z) - QuadMamba: Learning Quadtree-based Selective Scan for Visual State Space Model [16.01259690063522]
New vision Mamba model, coined QuadMamba, captures local dependencies of varying granularities via quadtree-based image partition and scan.
QuadMamba achieves state-of-the-art performance in various vision tasks, including image classification, object detection, instance segmentation, and semantic segmentation.
arXiv Detail & Related papers (2024-10-09T12:03:50Z) - VMamba: Visual State Space Model [98.0517369083152]
We adapt Mamba, a state-space language model, into VMamba, a vision backbone with linear time complexity.<n>At the core of VMamba is a stack of Visual State-Space (VSS) blocks with the 2D Selective Scan (SS2D) module.
arXiv Detail & Related papers (2024-01-18T17:55:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.