QMamba: Post-Training Quantization for Vision State Space Models
- URL: http://arxiv.org/abs/2501.13624v1
- Date: Thu, 23 Jan 2025 12:45:20 GMT
- Title: QMamba: Post-Training Quantization for Vision State Space Models
- Authors: Yinglong Li, Xiaoyu Liu, Jiacheng Li, Ruikang Xu, Yinda Chen, Zhiwei Xiong,
- Abstract summary: State Space Models (SSMs) have gained increasing attention for vision models recently.
Given the computational cost of deploying SSMs on resource-limited edge devices, Post-Training Quantization (PTQ) is a technique with the potential for efficient deployment of SSMs.
We propose QMamba, one of the first PTQ frameworks to be designed for vision SSMs based on the analysis of the activation distributions in SSMs.
- Score: 45.97843526485619
- License:
- Abstract: State Space Models (SSMs), as key components of Mamaba, have gained increasing attention for vision models recently, thanks to their efficient long sequence modeling capability. Given the computational cost of deploying SSMs on resource-limited edge devices, Post-Training Quantization (PTQ) is a technique with the potential for efficient deployment of SSMs. In this work, we propose QMamba, one of the first PTQ frameworks to our knowledge, designed for vision SSMs based on the analysis of the activation distributions in SSMs. We reveal that the distribution of discrete parameters exhibits long-tailed skewness and the distribution of the hidden state sequence exhibits highly dynamic variations. Correspondingly, we design Long-tailed Skewness Quantization (LtSQ) to quantize discrete parameters and Temporal Group Quantization (TGQ) to quantize hidden states, which reduces the quantization errors. Extensive experiments demonstrate that QMamba outperforms advanced PTQ methods on vision models across multiple model sizes and architectures. Notably, QMamba surpasses existing methods by 21.0% on ImageNet classification with 4-bit activations.
Related papers
- Selective State Space Memory for Large Vision-Language Models [0.0]
State Space Memory Integration (SSMI) is a novel approach for efficient fine-tuning of LVLMs.
SSMI captures long-range dependencies and injects task-specific visual and sequential patterns effectively.
experiments on benchmark datasets, including COCO Captioning, VQA, and Flickr30k, demonstrate that SSMI achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-12-13T05:40:50Z) - Mamba-CL: Optimizing Selective State Space Model in Null Space for Continual Learning [54.19222454702032]
Continual Learning aims to equip AI models with the ability to learn a sequence of tasks over time, without forgetting previously learned knowledge.
State Space Models (SSMs) have achieved notable success in computer vision.
We introduce Mamba-CL, a framework that continuously fine-tunes the core SSMs of the large-scale Mamba foundation model.
arXiv Detail & Related papers (2024-11-23T06:36:16Z) - GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model [66.35608254724566]
State-space models (SSMs) have showcased effective performance in modeling long-range dependencies with subquadratic complexity.
However, pure SSM-based models still face challenges related to stability and achieving optimal performance on computer vision tasks.
Our paper addresses the challenges of scaling SSM-based models for computer vision, particularly the instability and inefficiency of large model sizes.
arXiv Detail & Related papers (2024-07-18T17:59:58Z) - Mamba-PTQ: Outlier Channels in Recurrent Large Language Models [49.1574468325115]
We show that Mamba models exhibit the same pattern of outlier channels observed in attention-based LLMs.
We show that the reason for the difficulty of quantizing SSMs is caused by activation outliers, similar to those observed in transformer-based LLMs.
arXiv Detail & Related papers (2024-07-17T08:21:06Z) - Q-S5: Towards Quantized State Space Models [41.94295877935867]
State Space Models (SSMs) have emerged as a potent alternative to transformers.
This paper investigates the effect of quantization on the S5 model to understand its impact on model performance.
arXiv Detail & Related papers (2024-06-13T09:53:24Z) - DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception [66.88792390480343]
We propose DEEM, a simple but effective approach that utilizes the generative feedback of diffusion models to align the semantic distributions of the image encoder.
DEEM exhibits enhanced robustness and a superior capacity to alleviate model hallucinations while utilizing fewer trainable parameters, less pre-training data, and a smaller base model size.
arXiv Detail & Related papers (2024-05-24T05:46:04Z) - Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model [26.786890883280062]
State Space Models (SSMs) have garnered widespread attention due to their global receptive field and linear complexity.
To improve the performance of SSMs in vision tasks, a multi-scan strategy is widely adopted.
We introduce Multi-Scale Vision Mamba (MSVMamba) to preserve the superiority of SSMs in vision tasks with limited parameters.
arXiv Detail & Related papers (2024-05-23T04:59:49Z) - The Hidden Attention of Mamba Models [54.50526986788175]
The Mamba layer offers an efficient selective state space model (SSM) that is highly effective in modeling multiple domains.
We show that such models can be viewed as attention-driven models.
This new perspective enables us to empirically and theoretically compare the underlying mechanisms to that of the self-attention layers in transformers.
arXiv Detail & Related papers (2024-03-03T18:58:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.