Samba+: General and Accurate Salient Object Detection via A More Unified Mamba-based Framework
- URL: http://arxiv.org/abs/2602.01593v1
- Date: Mon, 02 Feb 2026 03:34:25 GMT
- Title: Samba+: General and Accurate Salient Object Detection via A More Unified Mamba-based Framework
- Authors: Wenzhuo Zhao, Keren Fu, Jiahao He, Xiaohong Liu, Qijun Zhao, Guangtao Zhai,
- Abstract summary: Saliency Mamba (Samba) is a pure Mamba-based architecture that flexibly handles various distinct salient object detection tasks.<n>Samba individually outperforms existing methods across six SOD tasks on 22 datasets with lower computational cost.<n>Samba+ achieves even superior results on these tasks and datasets by using a single trained versatile model.
- Score: 66.2103745798444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing salient object detection (SOD) models are generally constrained by the limited receptive fields of convolutional neural networks (CNNs) and quadratic computational complexity of Transformers. Recently, the emerging state-space model, namely Mamba, has shown great potential in balancing global receptive fields and computational efficiency. As a solution, we propose Saliency Mamba (Samba), a pure Mamba-based architecture that flexibly handles various distinct SOD tasks, including RGB/RGB-D/RGB-T SOD, video SOD (VSOD), RGB-D VSOD, and visible-depth-thermal SOD. Specifically, we rethink the scanning strategy of Mamba for SOD, and introduce a saliency-guided Mamba block (SGMB) that features a spatial neighborhood scanning (SNS) algorithm to preserve the spatial continuity of salient regions. A context-aware upsampling (CAU) method is also proposed to promote hierarchical feature alignment and aggregation by modeling contextual dependencies. As one step further, to avoid the "task-specific" problem as in previous SOD solutions, we develop Samba+, which is empowered by training Samba in a multi-task joint manner, leading to a more unified and versatile model. Two crucial components that collaboratively tackle challenges encountered in input of arbitrary modalities and continual adaptation are investigated. Specifically, a hub-and-spoke graph attention (HGA) module facilitates adaptive cross-modal interactive fusion, and a modality-anchored continual learning (MACL) strategy alleviates inter-modal conflicts together with catastrophic forgetting. Extensive experiments demonstrate that Samba individually outperforms existing methods across six SOD tasks on 22 datasets with lower computational cost, whereas Samba+ achieves even superior results on these tasks and datasets by using a single trained versatile model. Additional results further demonstrate the potential of our Samba framework.
Related papers
- SfMamba: Efficient Source-Free Domain Adaptation via Selective Scan Modeling [60.860172819390954]
Source-free domain adaptation (SFDA) tackles the challenge of adapting source-pretrained models to unlabeled target domains.<n>We propose a framework called SfMamba to fully explore the stable dependency in source-free model transfer.
arXiv Detail & Related papers (2026-01-13T14:53:47Z) - Dual Mamba for Node-Specific Representation Learning: Tackling Over-Smoothing with Selective State Space Modeling [12.115520585626046]
We propose a Dual Mamba-enhanced Graph Convolutional Network (DMbaGCN) to address over-smoothing.<n>DMbaGCN consists of two modules: the Local State-Evolution Mamba (LSEMba) for local neighborhood aggregation, and the Global Context-Aware Mamba (GCAMba) that leverages Mamba's global attention capabilities to incorporate global context for each node.
arXiv Detail & Related papers (2025-11-10T06:34:20Z) - CSFMamba: Cross State Fusion Mamba Operator for Multimodal Remote Sensing Image Classification [12.959829835589453]
We propose Cross State Fusion Mamba (Camba) Network.<n>Specifically, we first design the preprocessing module of remote sensing image information for the needs of Mamba structure.<n> Secondly, a cross-state module based on Mamba operator is creatively designed to fully fuse the feature of the two modalities.
arXiv Detail & Related papers (2025-08-31T03:08:34Z) - Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection [88.47928738482719]
Linear State Space Models (SSMs) offer remarkable performance gains in sequence modeling.<n>Recent advances, such as Mamba, further enhance SSMs with input-dependent gating and hardware-aware implementations.<n>We introduce Routing Mamba (RoM), a novel approach that scales SSM parameters using sparse mixtures of linear projection experts.
arXiv Detail & Related papers (2025-06-22T19:26:55Z) - Detail Matters: Mamba-Inspired Joint Unfolding Network for Snapshot Spectral Compressive Imaging [40.80197280147993]
We propose a Mamba-inspired Joint Unfolding Network (MiJUN) to overcome the inherent nonlinear and ill-posed characteristics of HSI reconstruction.<n>We introduce an accelerated unfolding network scheme, which reduces the reliance on initial optimization stages.<n>We refine the scanning strategy with Mamba by integrating the tensor mode-$k$ unfolding into the Mamba network.
arXiv Detail & Related papers (2025-01-02T13:56:23Z) - Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement [54.427965535613886]
Mamba, as a novel state-space model (SSM), has gained widespread application in natural language processing and computer vision.<n>In this work, we introduce Mamba-SEUNet, an innovative architecture that integrates Mamba with U-Net for SE tasks.
arXiv Detail & Related papers (2024-12-21T13:43:51Z) - Mamba-CL: Optimizing Selective State Space Model in Null Space for Continual Learning [54.19222454702032]
Continual Learning aims to equip AI models with the ability to learn a sequence of tasks over time, without forgetting previously learned knowledge.<n>State Space Models (SSMs) have achieved notable success in computer vision.<n>We introduce Mamba-CL, a framework that continuously fine-tunes the core SSMs of the large-scale Mamba foundation model.
arXiv Detail & Related papers (2024-11-23T06:36:16Z) - Efficient High-Resolution Visual Representation Learning with State Space Model for Human Pose Estimation [60.80423207808076]
Capturing long-range dependencies while preserving high-resolution visual representations is crucial for dense prediction tasks such as human pose estimation.<n>We propose the Dynamic Visual State Space (DVSS) block, which augments visual state space models with multi-scale convolutional operations.<n>We build HRVMamba, a novel model for efficient high-resolution representation learning.
arXiv Detail & Related papers (2024-10-04T06:19:29Z) - PPMamba: A Pyramid Pooling Local Auxiliary SSM-Based Model for Remote Sensing Image Semantic Segmentation [1.5136939451642137]
This paper proposes a novel network called Pyramid Pooling Mamba (PPMamba), which integrates CNN and Mamba for semantic segmentation tasks.
PPMamba achieves competitive performance compared to state-of-the-art models.
arXiv Detail & Related papers (2024-09-10T08:08:50Z) - DGMamba: Domain Generalization via Generalized State Space Model [80.82253601531164]
Domain generalization(DG) aims at solving distribution shift problems in various scenes.
Mamba, as an emerging state space model (SSM), possesses superior linear complexity and global receptive fields.
We propose a novel framework for DG, named DGMamba, that excels in strong generalizability toward unseen domains.
arXiv Detail & Related papers (2024-04-11T14:35:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.