Related papers: Q-S5: Towards Quantized State Space Models

Q-S5: Towards Quantized State Space Models

URL: http://arxiv.org/abs/2406.09477v1
Date: Thu, 13 Jun 2024 09:53:24 GMT
Title: Q-S5: Towards Quantized State Space Models
Authors: Steven Abreu, Jens E. Pedersen, Kade M. Heckel, Alessandro Pierro,
Abstract summary: State Space Models (SSMs) have emerged as a potent alternative to transformers. This paper investigates the effect of quantization on the S5 model to understand its impact on model performance.
Score: 41.94295877935867
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the quest for next-generation sequence modeling architectures, State Space Models (SSMs) have emerged as a potent alternative to transformers, particularly for their computational efficiency and suitability for dynamical systems. This paper investigates the effect of quantization on the S5 model to understand its impact on model performance and to facilitate its deployment to edge and resource-constrained platforms. Using quantization-aware training (QAT) and post-training quantization (PTQ), we systematically evaluate the quantization sensitivity of SSMs across different tasks like dynamical systems modeling, Sequential MNIST (sMNIST) and most of the Long Range Arena (LRA). We present fully quantized S5 models whose test accuracy drops less than 1% on sMNIST and most of the LRA. We find that performance on most tasks degrades significantly for recurrent weights below 8-bit precision, but that other components can be compressed further without significant loss of performance. Our results further show that PTQ only performs well on language-based LRA tasks whereas all others require QAT. Our investigation provides necessary insights for the continued development of efficient and hardware-optimized SSMs.

Related papers

QS4D: Quantization-aware training for efficient hardware deployment of structured state-space sequential models [0.8474310104568011]
Structured State Space models (SSM) have emerged as a new class of deep learning models.<n>QAT can significantly reduce the complexity of SSMs by up to two orders of magnitude across various performance metrics.<n>We show that QAT enhances robustness to analog noise and enables structural pruning.
arXiv Detail & Related papers (2025-07-08T15:19:14Z)
Quantizing Small-Scale State-Space Models for Edge AI [0.4941855521192951]
State-space models (SSMs) have recently gained attention in deep learning for their ability to efficiently model long-range dependencies.<n>In this paper, we analyze the effects of quantization on small-scale SSMs with a focus on reducing memory and computational costs while maintaining task performance.
arXiv Detail & Related papers (2025-06-14T12:43:47Z)
A Survey on Structured State Space Sequence (S4) Models [0.0]
Recent advancements in sequence modeling have led to the emergence of Structured State Space Models (SSMs) SSMs leverage structured recurrence and state-space representations to achieve superior long-sequence processing with linear or near-linear complexity. This survey serves as a structured guide for researchers and practitioners, detailing the advancements, trade-offs, and future directions of SSM-based architectures in AI and deep learning.
arXiv Detail & Related papers (2025-03-22T01:55:32Z)
QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge [55.75103034526652]
We propose QuartDepth which adopts post-training quantization to quantize MDE models with hardware accelerations for ASICs. Our approach involves quantizing both weights and activations to 4-bit precision, reducing the model size and computation cost. We design a flexible and programmable hardware accelerator by supporting kernel fusion and customized instruction programmability.
arXiv Detail & Related papers (2025-03-20T21:03:10Z)
QMamba: Post-Training Quantization for Vision State Space Models [45.97843526485619]
State Space Models (SSMs) have gained increasing attention for vision models recently. Given the computational cost of deploying SSMs on resource-limited edge devices, Post-Training Quantization (PTQ) is a technique with the potential for efficient deployment of SSMs. We propose QMamba, one of the first PTQ frameworks to be designed for vision SSMs based on the analysis of the activation distributions in SSMs.
arXiv Detail & Related papers (2025-01-23T12:45:20Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
LSTM-QGAN: Scalable NISQ Generative Adversarial Network [3.596166341956192]
Current quantum generative adversarial networks (QGANs) still struggle with practical-sized data. We propose LSTM-QGAN, a QGAN architecture that eliminates preprocessing and integrates quantum long short-term memory (QLSTM) to ensure scalable performance. Our experiments show that LSTM-QGAN significantly enhances both performance and scalability over state-of-the-art QGAN models.
arXiv Detail & Related papers (2024-09-03T18:27:15Z)
HOPE for a Robust Parameterization of Long-memory State Space Models [51.66430224089725]
State-space models (SSMs) that utilize linear, time-invariant (LTI) systems are known for their effectiveness in learning long sequences. We develop a new parameterization scheme, called HOPE, for LTI systems that utilize Markov parameters within Hankel operators. Our new parameterization endows the SSM with non-decaying memory within a fixed time window, which is empirically corroborated by a sequential CIFAR-10 task with padded noise.
arXiv Detail & Related papers (2024-05-22T20:20:14Z)
Graph Neural Networks for Parameterized Quantum Circuits Expressibility Estimation [5.074765131677166]
This paper introduces a novel approach for expressibility estimation of quantum circuits using Graph Neural Networks (GNNs) We demonstrate the predictive power of our GNN model with a dataset consisting of 25,000 samples from the noiseless IBM QASM Simulator and 12,000 samples from three distinct noisy quantum backends.
arXiv Detail & Related papers (2024-05-13T18:26:55Z)
EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models [21.17675493267517]
Post-training quantization (PTQ) and quantization-aware training (QAT) are two main approaches to compress and accelerate diffusion models. We introduce a data-free and parameter-efficient fine-tuning framework for low-bit diffusion models, dubbed EfficientDM, to achieve QAT-level performance with PTQ-like efficiency. Our method significantly outperforms previous PTQ-based diffusion models while maintaining similar time and data efficiency.
arXiv Detail & Related papers (2023-10-05T02:51:53Z)
Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study [90.34226812493083]
This work aims to investigate the impact of quantization on emphemergent abilities, which are important characteristics that distinguish LLMs from small language models. Our empirical experiments show that these emergent abilities still exist in 4-bit quantization models, while 2-bit models encounter severe performance degradation. To improve the performance of low-bit models, we conduct two special experiments: (1) fine-gained impact analysis that studies which components (or substructures) are more sensitive to quantization, and (2) performance compensation through model fine-tuning.
arXiv Detail & Related papers (2023-07-16T15:11:01Z)
Modular Quantization-Aware Training for 6D Object Pose Estimation [52.9436648014338]
Edge applications demand efficient 6D object pose estimation on resource-constrained embedded platforms. We introduce Modular Quantization-Aware Training (MQAT), an adaptive and mixed-precision quantization-aware training strategy. MQAT guides a systematic gradated modular quantization sequence and determines module-specific bit precisions, leading to quantized models that outperform those produced by state-of-the-art uniform and mixed-precision quantization techniques.
arXiv Detail & Related papers (2023-03-12T21:01:54Z)
Fully Quantized Image Super-Resolution Networks [81.75002888152159]
We propose a Fully Quantized image Super-Resolution framework (FQSR) to jointly optimize efficiency and accuracy. We apply our quantization scheme on multiple mainstream super-resolution architectures, including SRResNet, SRGAN and EDSR. Our FQSR using low bits quantization can achieve on par performance compared with the full-precision counterparts on five benchmark datasets.
arXiv Detail & Related papers (2020-11-29T03:53:49Z)
End-to-End Quantum Machine Learning Implemented with Controlled Quantum Dynamics [0.9599644507730106]
This work presents a hardware-friendly end-to-end quantum machine learning scheme that can be implemented with imperfect near-term intermediate-scale quantum (NISQ) processors. The proposal transforms the machine learning task to the optimization of controlled quantum dynamics, in which the learning model is parameterized by experimentally tunable control variables. Our design also enables automated feature selection by encoding the raw input to quantum states through agent control variables.
arXiv Detail & Related papers (2020-03-30T17:44:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.