Related papers: Fore-Mamba3D: Mamba-based Foreground-Enhanced Encoding for 3D Object Detection

Fore-Mamba3D: Mamba-based Foreground-Enhanced Encoding for 3D Object Detection

URL: http://arxiv.org/abs/2602.19536v1
Date: Mon, 23 Feb 2026 06:03:07 GMT
Title: Fore-Mamba3D: Mamba-based Foreground-Enhanced Encoding for 3D Object Detection
Authors: Zhiwei Ning, Xuanang Gao, Jiaxi Cao, Runze Yang, Huiying Xu, Xinzhong Zhu, Jie Yang, Wei Liu,
Abstract summary: We propose a novel backbone, Fore-Mamba3D, to focus on the foreground enhancement by modifying Mamba-based encoder.<n>Considering the response attenuation existing in the interaction of foreground voxels across different instances, we design a regional-to-global slide window.<n>Our method emphasizes foreground-only encoding and alleviates the distance-based and causal dependencies in the linear autore model.
Score: 16.398581898787608
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Linear modeling methods like Mamba have been merged as the effective backbone for the 3D object detection task. However, previous Mamba-based methods utilize the bidirectional encoding for the whole non-empty voxel sequence, which contains abundant useless background information in the scenes. Though directly encoding foreground voxels appears to be a plausible solution, it tends to degrade detection performance. We attribute this to the response attenuation and restricted context representation in the linear modeling for fore-only sequences. To address this problem, we propose a novel backbone, termed Fore-Mamba3D, to focus on the foreground enhancement by modifying Mamba-based encoder. The foreground voxels are first sampled according to the predicted scores. Considering the response attenuation existing in the interaction of foreground voxels across different instances, we design a regional-to-global slide window (RGSW) to propagate the information from regional split to the entire sequence. Furthermore, a semantic-assisted and state spatial fusion module (SASFMamba) is proposed to enrich contextual representation by enhancing semantic and geometric awareness within the Mamba model. Our method emphasizes foreground-only encoding and alleviates the distance-based and causal dependencies in the linear autoregression model. The superior performance across various benchmarks demonstrates the effectiveness of Fore-Mamba3D in the 3D object detection task.

Related papers

Samba+: General and Accurate Salient Object Detection via A More Unified Mamba-based Framework [66.2103745798444]
Saliency Mamba (Samba) is a pure Mamba-based architecture that flexibly handles various distinct salient object detection tasks.<n>Samba individually outperforms existing methods across six SOD tasks on 22 datasets with lower computational cost.<n>Samba+ achieves even superior results on these tasks and datasets by using a single trained versatile model.
arXiv Detail & Related papers (2026-02-02T03:34:25Z)
TextMamba: Scene Text Detector with Mamba [6.992080935409672]
We propose a novel scene text detector based on Mamba that integrates the selection mechanism with attention layers.<n>We adopt the Top_k algorithm to explicitly select key information and reduce the interference of irrelevant information in Mamba modeling.<n>Our method achieves state-of-the-art or competitive performance on various benchmarks.
arXiv Detail & Related papers (2025-12-07T05:06:19Z)
WinMamba: Multi-Scale Shifted Windows in State Space Model for 3D Object Detection [22.498942151484624]
WinMamba is a novel Mamba-based 3D feature-encoding backbone composed of stacked WinMamba blocks.<n>To enhance the backbone with robust multi-scale representation, the WinMamba block incorporates a window-scale-adaptive module.<n>Experiments on the KITTI and datasets demonstrate that WinMamba significantly outperforms the baseline.
arXiv Detail & Related papers (2025-11-17T08:46:54Z)
AtrousMamaba: An Atrous-Window Scanning Visual State Space Model for Remote Sensing Change Detection [29.004019252136565]
We propose a novel model, AtrousMamba, which balances the extraction of fine-grained local details with the integration of global contextual information.<n>By leveraging the atrous window scan visual state space (AWVSS) module, we design dedicated end-to-end Mamba-based frameworks for binary change detection (BCD) and semantic change detection (SCD)<n> Experimental results on six benchmark datasets show that the proposed framework outperforms existing CNN-based, Transformer-based, and Mamba-based methods.
arXiv Detail & Related papers (2025-07-22T02:36:16Z)
InceptionMamba: An Efficient Hybrid Network with Large Band Convolution and Bottleneck Mamba [21.47782205082816]
InceptionNeXt has shown excellent competitiveness in image classification and a number of downstream tasks.<n>Built on parallel one-dimensional strip convolutions, InceptionNeXt suffers from limited ability of capturing spatial dependencies along different dimensions.<n>We propose a novel backbone architecture termed InceptionMamba to overcome these limitations.
arXiv Detail & Related papers (2025-06-10T12:31:05Z)
DefMamba: Deformable Visual State Space Model [65.50381013020248]
We propose a novel visual foundation model called DefMamba.<n>By combining a deformable scanning(DS) strategy, this model significantly improves its ability to learn image structures and detects changes in object details.<n>Numerous experiments have shown that DefMamba achieves state-of-the-art performance in various visual tasks.
arXiv Detail & Related papers (2025-04-08T08:22:54Z)
Efficient Feature Aggregation and Scale-Aware Regression for Monocular 3D Object Detection [40.14197775884804]
MonoASRH is a novel monocular 3D detection framework composed of Efficient Hybrid Feature Aggregation Module (EH-FAM) and Adaptive Scale-Aware 3D Regression Head (ASRH)<n>EH-FAM employs multi-head attention with a global receptive field to extract semantic features for small-scale objects.<n>ASRH encodes 2D bounding box dimensions and then fuses scale features with the semantic features aggregated by EH-FAM.
arXiv Detail & Related papers (2024-11-05T02:33:25Z)
Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detection [59.34834815090167]
Serialization-based methods, which serialize the 3D voxels and group them into multiple sequences before inputting to Transformers, have demonstrated their effectiveness in 3D object detection. We present a Voxel SSM, which employs a group-free strategy to serialize the whole space of voxels into a single sequence.
arXiv Detail & Related papers (2024-06-15T17:45:07Z)
Point Cloud Mamba: Point Cloud Learning via State Space Model [73.7454734756626]
We show that Mamba-based point cloud methods can outperform previous methods based on transformer or multi-layer perceptrons (MLPs) In particular, we demonstrate that Mamba-based point cloud methods can outperform previous methods based on transformer or multi-layer perceptrons (MLPs) Point Cloud Mamba surpasses the state-of-the-art (SOTA) point-based method PointNeXt and achieves new SOTA performance on the ScanNN, ModelNet40, ShapeNetPart, and S3DIS datasets.
arXiv Detail & Related papers (2024-03-01T18:59:03Z)
PointMamba: A Simple State Space Model for Point Cloud Analysis [65.59944745840866]
We propose PointMamba, transferring the success of Mamba, a recent representative state space model (SSM), from NLP to point cloud analysis tasks. Unlike traditional Transformers, PointMamba employs a linear complexity algorithm, presenting global modeling capacity while significantly reducing computational costs.
arXiv Detail & Related papers (2024-02-16T14:56:13Z)
Volumetric Semantically Consistent 3D Panoptic Mapping [77.13446499924977]
We introduce an online 2D-to-3D semantic instance mapping algorithm aimed at generating semantic 3D maps suitable for autonomous agents in unstructured environments. It introduces novel ways of integrating semantic prediction confidence during mapping, producing semantic and instance-consistent 3D regions. The proposed method achieves accuracy superior to the state of the art on public large-scale datasets, improving on a number of widely used metrics.
arXiv Detail & Related papers (2023-09-26T08:03:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.