OuroMamba: A Data-Free Quantization Framework for Vision Mamba Models
- URL: http://arxiv.org/abs/2503.10959v1
- Date: Thu, 13 Mar 2025 23:58:55 GMT
- Title: OuroMamba: A Data-Free Quantization Framework for Vision Mamba Models
- Authors: Akshat Ramachandran, Mingyu Lee, Huan Xu, Souvik Kundu, Tushar Krishna,
- Abstract summary: We present OuroMamba, the first data-free post-training quantization (DFQ) method for vision Mamba-based models (VMMs)<n>OuroMamba presents a two-stage framework: (1) OuroMamba-Gen to generate semantically rich and meaningful synthetic data; (2) OuroMamba-Quant to employ mixed-precision quantization with lightweight dynamic outlier detection during inference.
- Score: 15.757637971482477
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present OuroMamba, the first data-free post-training quantization (DFQ) method for vision Mamba-based models (VMMs). We identify two key challenges in enabling DFQ for VMMs, (1) VMM's recurrent state transitions restricts capturing of long-range interactions and leads to semantically weak synthetic data, (2) VMM activations exhibit dynamic outlier variations across time-steps, rendering existing static PTQ techniques ineffective. To address these challenges, OuroMamba presents a two-stage framework: (1) OuroMamba-Gen to generate semantically rich and meaningful synthetic data. It applies contrastive learning on patch level VMM features generated through neighborhood interactions in the latent state space, (2) OuroMamba-Quant to employ mixed-precision quantization with lightweight dynamic outlier detection during inference. In specific, we present a thresholding based outlier channel selection strategy for activations that gets updated every time-step. Extensive experiments across vision and generative tasks show that our data-free OuroMamba surpasses existing data-driven PTQ techniques, achieving state-of-the-art performance across diverse quantization settings. Additionally, we implement efficient GPU kernels to achieve practical latency speedup of up to 2.36x. Code will be released soon.
Related papers
- MVQA: Mamba with Unified Sampling for Efficient Video Quality Assessment [24.053542031123985]
We present MVQA, a Mamba-based model designed for efficient video quality assessment (VQA)
USDS combines semantic patch sampling from low-resolution videos and distortion patch sampling from original-resolution videos.
Experiments show that the proposed MVQA, equipped with USDS, achieve comparable performance to state-of-the-art methods.
arXiv Detail & Related papers (2025-04-22T16:08:23Z) - OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models [36.0400717590138]
We present OmniMamba, the first linear-architecture-based multimodal generation model.<n>It generates both text and images through a unified next-token prediction paradigm.<n>It achieves competitive performance with JanusFlow while surpassing Show-o across benchmarks.
arXiv Detail & Related papers (2025-03-11T17:59:46Z) - HiSTF Mamba: Hierarchical Spatiotemporal Fusion with Multi-Granular Body-Spatial Modeling for High-Fidelity Text-to-Motion Generation [11.63340847947103]
We propose a novel HiSTF Mamba framework for text-to-motion generation.
We show that HiSTF Mamba achieves state-of-the-art performance across multiple metrics.
These findings validate the effectiveness of HiSTF Mamba in achieving high fidelity and strong semantic alignment.
arXiv Detail & Related papers (2025-03-10T04:01:48Z) - PTQ4VM: Post-Training Quantization for Visual Mamba [9.446971590056945]
We propose Post-Training Quantization for Visual Mamba (PTQ4VM), which introduces two key strategies: Per-Token Static (PTS) quantization and Joint Learning of Smoothing Scale and Step Size (JLSS)<n> PTQ4VM can be applied to various Visual Mamba backbones, converting the pretrained model to a quantized format in under 15 minutes without notable quality degradation.
arXiv Detail & Related papers (2024-12-29T07:21:33Z) - Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement [54.427965535613886]
Mamba, as a novel state-space model (SSM), has gained widespread application in natural language processing and computer vision.<n>In this work, we introduce Mamba-SEUNet, an innovative architecture that integrates Mamba with U-Net for SE tasks.
arXiv Detail & Related papers (2024-12-21T13:43:51Z) - Look Every Frame All at Once: Video-Ma$^2$mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing [52.050036778325094]
Video-Ma$2$mba is a novel architecture that incorporates State Space Models (SSMs) within the Mamba-2 framework.
Our approach significantly reduces the memory footprint compared to standard gradient checkpointing.
By maintaining a detailed capture of temporal dynamics, our model improves the accuracy and relevance of responses in long video understanding tasks.
arXiv Detail & Related papers (2024-11-29T04:12:13Z) - MobileMamba: Lightweight Multi-Receptive Visual Mamba Network [51.33486891724516]
Previous research on lightweight models has primarily focused on CNNs and Transformer-based designs.
We propose the MobileMamba framework, which balances efficiency and performance.
MobileMamba achieves up to 83.6% on Top-1, surpassing existing state-of-the-art methods.
arXiv Detail & Related papers (2024-11-24T18:01:05Z) - V2M: Visual 2-Dimensional Mamba for Image Representation Learning [68.51380287151927]
Mamba has garnered widespread attention due to its flexible design and efficient hardware performance to process 1D sequences.
Recent studies have attempted to apply Mamba to the visual domain by flattening 2D images into patches and then regarding them as a 1D sequence.
We propose a Visual 2-Dimensional Mamba model as a complete solution, which directly processes image tokens in the 2D space.
arXiv Detail & Related papers (2024-10-14T11:11:06Z) - SIGMA: Selective Gated Mamba for Sequential Recommendation [56.85338055215429]
Mamba, a recent advancement, has exhibited exceptional performance in time series prediction.<n>We introduce a new framework named Selective Gated Mamba ( SIGMA) for Sequential Recommendation.<n>Our results indicate that SIGMA outperforms current models on five real-world datasets.
arXiv Detail & Related papers (2024-08-21T09:12:59Z) - Bi-Mamba+: Bidirectional Mamba for Time Series Forecasting [5.166854384000439]
Long-term time series forecasting (LTSF) provides longer insights into future trends and patterns.
Recently, a new state space model (SSM) named Mamba is proposed.
With the selective capability on input data and the hardware-aware parallel computing algorithm, Mamba has shown great potential in balancing predicting performance and computational efficiency.
arXiv Detail & Related papers (2024-04-24T09:45:48Z) - VMamba: Visual State Space Model [98.0517369083152]
We adapt Mamba, a state-space language model, into VMamba, a vision backbone with linear time complexity.<n>At the core of VMamba is a stack of Visual State-Space (VSS) blocks with the 2D Selective Scan (SS2D) module.
arXiv Detail & Related papers (2024-01-18T17:55:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.