PTQ4VM: Post-Training Quantization for Visual Mamba
- URL: http://arxiv.org/abs/2412.20386v1
- Date: Sun, 29 Dec 2024 07:21:33 GMT
- Title: PTQ4VM: Post-Training Quantization for Visual Mamba
- Authors: Younghyun Cho, Changhun Lee, Seonggon Kim, Eunhyeok Park,
- Abstract summary: We propose Post-Training Quantization for Visual Mamba (PTQ4VM), which introduces two key strategies: Per-Token Static (PTS) quantization and Joint Learning of Smoothing Scale and Step Size (JLSS)
PTQ4VM can be applied to various Visual Mamba backbones, converting the pretrained model to a quantized format in under 15 minutes without notable quality degradation.
- Score: 9.446971590056945
- License:
- Abstract: Visual Mamba is an approach that extends the selective space state model, Mamba, to vision tasks. It processes image tokens sequentially in a fixed order, accumulating information to generate outputs. Despite its growing popularity for delivering high-quality outputs at a low computational cost across various tasks, Visual Mamba is highly susceptible to quantization, which makes further performance improvements challenging. Our analysis reveals that the fixed token access order in Visual Mamba introduces unique quantization challenges, which we categorize into three main issues: 1) token-wise variance, 2) channel-wise outliers, and 3) a long tail of activations. To address these challenges, we propose Post-Training Quantization for Visual Mamba (PTQ4VM), which introduces two key strategies: Per-Token Static (PTS) quantization and Joint Learning of Smoothing Scale and Step Size (JLSS). To the our best knowledge, this is the first quantization study on Visual Mamba. PTQ4VM can be applied to various Visual Mamba backbones, converting the pretrained model to a quantized format in under 15 minutes without notable quality degradation. Extensive experiments on large-scale classification and regression tasks demonstrate its effectiveness, achieving up to 1.83x speedup on GPUs with negligible accuracy loss compared to FP16. Our code is available at https://github.com/YoungHyun197/ptq4vm.
Related papers
- MambaQuant: Quantizing the Mamba Family with Variance Aligned Rotation Methods [10.08926186041001]
Mamba is an efficient sequence model that rivals Transformers.
Existing quantization methods, which have been effective for CNN and Transformer models, appear inadequate for Mamba.
We propose MambaQuant, a post-training quantization framework consisting of: 1) Karhunen-Loeve Transformation (KLT) enhanced rotation, rendering the rotation matrix adaptable to diverse channel distributions, and 2) Smooth-Fused rotation, which equalizes channel variances and can merge additional parameters into model weights.
arXiv Detail & Related papers (2025-01-23T08:57:33Z) - Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement [54.427965535613886]
Mamba, as a novel state-space model (SSM), has gained widespread application in natural language processing and computer vision.
In this work, we introduce Mamba-SEUNet, an innovative architecture that integrates Mamba with U-Net for SE tasks.
arXiv Detail & Related papers (2024-12-21T13:43:51Z) - MobileMamba: Lightweight Multi-Receptive Visual Mamba Network [51.33486891724516]
Previous research on lightweight models has primarily focused on CNNs and Transformer-based designs.
We propose the MobileMamba framework, which balances efficiency and performance.
MobileMamba achieves up to 83.6% on Top-1, surpassing existing state-of-the-art methods.
arXiv Detail & Related papers (2024-11-24T18:01:05Z) - V2M: Visual 2-Dimensional Mamba for Image Representation Learning [68.51380287151927]
Mamba has garnered widespread attention due to its flexible design and efficient hardware performance to process 1D sequences.
Recent studies have attempted to apply Mamba to the visual domain by flattening 2D images into patches and then regarding them as a 1D sequence.
We propose a Visual 2-Dimensional Mamba model as a complete solution, which directly processes image tokens in the 2D space.
arXiv Detail & Related papers (2024-10-14T11:11:06Z) - PackMamba: Efficient Processing of Variable-Length Sequences in Mamba training [13.926804198202582]
Mamba, emerging as a groundbreaking architecture in the field of generative AI, demonstrates remarkable proficiency in handling elongated sequences.
Existing training framework of Mamba presents inefficiency with variable-length sequence inputs.
We propose PackMamba, a high- throughput Mamba that efficiently handles variable-length sequences.
arXiv Detail & Related papers (2024-08-07T16:13:43Z) - Q-Mamba: On First Exploration of Vision Mamba for Image Quality Assessment [15.320011514412437]
We take the first exploration of the recently popular foundation model, i.e., State Space Model/Mamba, in image quality assessment.
We propose Q-Mamba by revisiting and adapting the Mamba model for three crucial IQA tasks.
Our proposed StylePrompt enables better perception transfer capability with less computational cost.
arXiv Detail & Related papers (2024-06-13T19:21:01Z) - Demystify Mamba in Vision: A Linear Attention Perspective [72.93213667713493]
Mamba is an effective state space model with linear computation complexity.
We show that Mamba shares surprising similarities with linear attention Transformer.
We propose a Mamba-Inspired Linear Attention (MILA) model by incorporating the merits of these two key designs into linear attention.
arXiv Detail & Related papers (2024-05-26T15:31:09Z) - Visual Mamba: A Survey and New Outlooks [33.90213491829634]
Mamba, a recent selective structured state space model, excels in long sequence modeling.
Since January 2024, Mamba has been actively applied to diverse computer vision tasks.
This paper reviews visual Mamba approaches, analyzing over 200 papers.
arXiv Detail & Related papers (2024-04-29T16:51:30Z) - Is Mamba Capable of In-Context Learning? [63.682741783013306]
State of the art foundation models such as GPT-4 perform surprisingly well at in-context learning (ICL)
This work provides empirical evidence that Mamba, a newly proposed state space model, has similar ICL capabilities.
arXiv Detail & Related papers (2024-02-05T16:39:12Z) - Pink: Unveiling the Power of Referential Comprehension for Multi-modal
LLMs [49.88461345825586]
This paper proposes a new framework to enhance the fine-grained image understanding abilities of MLLMs.
We present a new method for constructing the instruction tuning dataset at a low cost by leveraging annotations in existing datasets.
We show that our model exhibits a 5.2% accuracy improvement over Qwen-VL and surpasses the accuracy of Kosmos-2 by 24.7%.
arXiv Detail & Related papers (2023-10-01T05:53:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.