Related papers: GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model

GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model

URL: http://arxiv.org/abs/2407.13772v1
Date: Thu, 18 Jul 2024 17:59:58 GMT
Title: GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model
Authors: Abdelrahman Shaker, Syed Talal Wasim, Salman Khan, Juergen Gall, Fahad Shahbaz Khan,
Abstract summary: State-space models (SSMs) have showcased effective performance in modeling long-range dependencies with subquadratic complexity. However, pure SSM-based models still face challenges related to stability and achieving optimal performance on computer vision tasks. Our paper addresses the challenges of scaling SSM-based models for computer vision, particularly the instability and inefficiency of large model sizes.
Score: 66.35608254724566
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advancements in state-space models (SSMs) have showcased effective performance in modeling long-range dependencies with subquadratic complexity. However, pure SSM-based models still face challenges related to stability and achieving optimal performance on computer vision tasks. Our paper addresses the challenges of scaling SSM-based models for computer vision, particularly the instability and inefficiency of large model sizes. To address this, we introduce a Modulated Group Mamba layer which divides the input channels into four groups and applies our proposed SSM-based efficient Visual Single Selective Scanning (VSSS) block independently to each group, with each VSSS block scanning in one of the four spatial directions. The Modulated Group Mamba layer also wraps the four VSSS blocks into a channel modulation operator to improve cross-channel communication. Furthermore, we introduce a distillation-based training objective to stabilize the training of large models, leading to consistent performance gains. Our comprehensive experiments demonstrate the merits of the proposed contributions, leading to superior performance over existing methods for image classification on ImageNet-1K, object detection, instance segmentation on MS-COCO, and semantic segmentation on ADE20K. Our tiny variant with 23M parameters achieves state-of-the-art performance with a classification top-1 accuracy of 83.3% on ImageNet-1K, while being 26% efficient in terms of parameters, compared to the best existing Mamba design of same model size. Our code and models are available at: https://github.com/Amshaker/GroupMamba.

Related papers

MVNet: Hyperspectral Remote Sensing Image Classification Based on Hybrid Mamba-Transformer Vision Backbone Architecture [12.168520751389622]
Hyperspectral image (HSI) classification faces challenges such as high-dimensional data, limited training samples, and spectral redundancy.<n>This paper proposes a novel MVNet network architecture that integrates 3D-CNN's local feature extraction, Transformer's global modeling, and Mamba's linear sequence modeling capabilities.<n>On IN, UP, and KSC datasets, MVNet outperforms mainstream hyperspectral image classification methods in both classification accuracy and computational efficiency.
arXiv Detail & Related papers (2025-07-06T14:52:26Z)
Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection [88.47928738482719]
Linear State Space Models (SSMs) offer remarkable performance gains in sequence modeling.<n>Recent advances, such as Mamba, further enhance SSMs with input-dependent gating and hardware-aware implementations.<n>We introduce Routing Mamba (RoM), a novel approach that scales SSM parameters using sparse mixtures of linear projection experts.
arXiv Detail & Related papers (2025-06-22T19:26:55Z)
Structural Similarity-Inspired Unfolding for Lightweight Image Super-Resolution [88.20464308588889]
We propose a Structural Similarity-Inspired Unfolding (SSIU) method for efficient image SR.<n>This method is designed through unfolding an SR optimization function constrained by structural similarity.<n>Our model outperforms current state-of-the-art models, boasting lower parameter counts and reduced memory consumption.
arXiv Detail & Related papers (2025-06-13T14:29:40Z)
QMamba: Post-Training Quantization for Vision State Space Models [45.97843526485619]
State Space Models (SSMs) have gained increasing attention for vision models recently. Given the computational cost of deploying SSMs on resource-limited edge devices, Post-Training Quantization (PTQ) is a technique with the potential for efficient deployment of SSMs. We propose QMamba, one of the first PTQ frameworks to be designed for vision SSMs based on the analysis of the activation distributions in SSMs.
arXiv Detail & Related papers (2025-01-23T12:45:20Z)
Selective State Space Memory for Large Vision-Language Models [0.0]
State Space Memory Integration (SSMI) is a novel approach for efficient fine-tuning of LVLMs. SSMI captures long-range dependencies and injects task-specific visual and sequential patterns effectively. experiments on benchmark datasets, including COCO Captioning, VQA, and Flickr30k, demonstrate that SSMI achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-12-13T05:40:50Z)
Mamba-CL: Optimizing Selective State Space Model in Null Space for Continual Learning [54.19222454702032]
Continual Learning aims to equip AI models with the ability to learn a sequence of tasks over time, without forgetting previously learned knowledge. State Space Models (SSMs) have achieved notable success in computer vision. We introduce Mamba-CL, a framework that continuously fine-tunes the core SSMs of the large-scale Mamba foundation model.
arXiv Detail & Related papers (2024-11-23T06:36:16Z)
Distillation-free Scaling of Large SSMs for Images and Videos [27.604572990625144]
State-space models (SSMs) have introduced a novel context modeling method by integrating state-space techniques into deep learning. Mamba-based architectures are difficult to scale with respect to the number of parameters, which is a major limitation for vision applications. We propose a Mamba-Attention interleaved architecture that enhances scalability, robustness, and performance.
arXiv Detail & Related papers (2024-09-18T10:48:10Z)
Microscopic-Mamba: Revealing the Secrets of Microscopic Images with Just 4M Parameters [12.182070604073585]
CNNs struggle with modeling long-range dependencies, limiting their ability to fully utilize semantic information in images. Transformers are hampered by the complexity of quadratic computations. We propose a model based on the Mamba architecture: Microscopic-Mamba.
arXiv Detail & Related papers (2024-09-12T10:01:33Z)
Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model [26.786890883280062]
State Space Models (SSMs) have garnered widespread attention due to their global receptive field and linear complexity. To improve the performance of SSMs in vision tasks, a multi-scan strategy is widely adopted. We introduce Multi-Scale Vision Mamba (MSVMamba) to preserve the superiority of SSMs in vision tasks with limited parameters.
arXiv Detail & Related papers (2024-05-23T04:59:49Z)
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection [5.37935922811333]
MambaMixer is a new architecture with data-dependent weights that uses a dual selection mechanism across tokens and channels. As a proof of concept, we design Vision MambaMixer (ViM2) and Time Series MambaMixer (TSM2) architectures based on the MambaMixer block.
arXiv Detail & Related papers (2024-03-29T00:05:13Z)
MamMIL: Multiple Instance Learning for Whole Slide Images with State Space Models [56.37780601189795]
We propose a framework named MamMIL for WSI analysis. We represent each WSI as an undirected graph. To address the problem that Mamba can only process 1D sequences, we propose a topology-aware scanning mechanism.
arXiv Detail & Related papers (2024-03-08T09:02:13Z)
The Hidden Attention of Mamba Models [54.50526986788175]
The Mamba layer offers an efficient selective state space model (SSM) that is highly effective in modeling multiple domains. We show that such models can be viewed as attention-driven models. This new perspective enables us to empirically and theoretically compare the underlying mechanisms to that of the self-attention layers in transformers.
arXiv Detail & Related papers (2024-03-03T18:58:21Z)
Weakly-Supervised Concealed Object Segmentation with SAM-based Pseudo Labeling and Multi-scale Feature Grouping [40.07070188661184]
Weakly-Supervised Concealed Object (WSCOS) aims to segment objects well blended with surrounding environments. It is hard to distinguish concealed objects from the background due to the intrinsic similarity. We propose a new WSCOS method to address these two challenges.
arXiv Detail & Related papers (2023-05-18T14:31:34Z)
Switchable Representation Learning Framework with Self-compatibility [50.48336074436792]
We propose a Switchable representation learning Framework with Self-Compatibility (SFSC) SFSC generates a series of compatible sub-models with different capacities through one training process. SFSC achieves state-of-the-art performance on the evaluated datasets.
arXiv Detail & Related papers (2022-06-16T16:46:32Z)
Sparse MoEs meet Efficient Ensembles [49.313497379189315]
We study the interplay of two popular classes of such models: ensembles of neural networks and sparse mixture of experts (sparse MoEs) We present Efficient Ensemble of Experts (E$3$), a scalable and simple ensemble of sparse MoEs that takes the best of both classes of models, while using up to 45% fewer FLOPs than a deep ensemble.
arXiv Detail & Related papers (2021-10-07T11:58:35Z)
Multiscale Deep Equilibrium Models [162.15362280927476]
We propose a new class of implicit networks, the multiscale deep equilibrium model (MDEQ) An MDEQ directly solves for and backpropagates through the equilibrium points of multiple feature resolutions simultaneously. We illustrate the effectiveness of this approach on two large-scale vision tasks: ImageNet classification and semantic segmentation on high-resolution images from the Cityscapes dataset.
arXiv Detail & Related papers (2020-06-15T18:07:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.