Q-S5: Towards Quantized State Space Models
- URL: http://arxiv.org/abs/2406.09477v1
- Date: Thu, 13 Jun 2024 09:53:24 GMT
- Title: Q-S5: Towards Quantized State Space Models
- Authors: Steven Abreu, Jens E. Pedersen, Kade M. Heckel, Alessandro Pierro,
- Abstract summary: State Space Models (SSMs) have emerged as a potent alternative to transformers.
This paper investigates the effect of quantization on the S5 model to understand its impact on model performance.
- Score: 41.94295877935867
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the quest for next-generation sequence modeling architectures, State Space Models (SSMs) have emerged as a potent alternative to transformers, particularly for their computational efficiency and suitability for dynamical systems. This paper investigates the effect of quantization on the S5 model to understand its impact on model performance and to facilitate its deployment to edge and resource-constrained platforms. Using quantization-aware training (QAT) and post-training quantization (PTQ), we systematically evaluate the quantization sensitivity of SSMs across different tasks like dynamical systems modeling, Sequential MNIST (sMNIST) and most of the Long Range Arena (LRA). We present fully quantized S5 models whose test accuracy drops less than 1% on sMNIST and most of the LRA. We find that performance on most tasks degrades significantly for recurrent weights below 8-bit precision, but that other components can be compressed further without significant loss of performance. Our results further show that PTQ only performs well on language-based LRA tasks whereas all others require QAT. Our investigation provides necessary insights for the continued development of efficient and hardware-optimized SSMs.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - LSTM-QGAN: Scalable NISQ Generative Adversarial Network [3.596166341956192]
Current quantum generative adversarial networks (QGANs) still struggle with practical-sized data.
We propose LSTM-QGAN, a QGAN architecture that eliminates preprocessing and integrates quantum long short-term memory (QLSTM) to ensure scalable performance.
Our experiments show that LSTM-QGAN significantly enhances both performance and scalability over state-of-the-art QGAN models.
arXiv Detail & Related papers (2024-09-03T18:27:15Z) - HOPE for a Robust Parameterization of Long-memory State Space Models [51.66430224089725]
State-space models (SSMs) that utilize linear, time-invariant (LTI) systems are known for their effectiveness in learning long sequences.
We develop a new parameterization scheme, called HOPE, for LTI systems that utilize Markov parameters within Hankel operators.
Our new parameterization endows the SSM with non-decaying memory within a fixed time window, which is empirically corroborated by a sequential CIFAR-10 task with padded noise.
arXiv Detail & Related papers (2024-05-22T20:20:14Z) - Graph Neural Networks for Parameterized Quantum Circuits Expressibility Estimation [5.074765131677166]
This paper introduces a novel approach for expressibility estimation of quantum circuits using Graph Neural Networks (GNNs)
We demonstrate the predictive power of our GNN model with a dataset consisting of 25,000 samples from the noiseless IBM QASM Simulator and 12,000 samples from three distinct noisy quantum backends.
arXiv Detail & Related papers (2024-05-13T18:26:55Z) - EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models [21.17675493267517]
Post-training quantization (PTQ) and quantization-aware training (QAT) are two main approaches to compress and accelerate diffusion models.
We introduce a data-free and parameter-efficient fine-tuning framework for low-bit diffusion models, dubbed EfficientDM, to achieve QAT-level performance with PTQ-like efficiency.
Our method significantly outperforms previous PTQ-based diffusion models while maintaining similar time and data efficiency.
arXiv Detail & Related papers (2023-10-05T02:51:53Z) - Do Emergent Abilities Exist in Quantized Large Language Models: An
Empirical Study [90.34226812493083]
This work aims to investigate the impact of quantization on emphemergent abilities, which are important characteristics that distinguish LLMs from small language models.
Our empirical experiments show that these emergent abilities still exist in 4-bit quantization models, while 2-bit models encounter severe performance degradation.
To improve the performance of low-bit models, we conduct two special experiments: (1) fine-gained impact analysis that studies which components (or substructures) are more sensitive to quantization, and (2) performance compensation through model fine-tuning.
arXiv Detail & Related papers (2023-07-16T15:11:01Z) - Modular Quantization-Aware Training for 6D Object Pose Estimation [52.9436648014338]
Edge applications demand efficient 6D object pose estimation on resource-constrained embedded platforms.
We introduce Modular Quantization-Aware Training (MQAT), an adaptive and mixed-precision quantization-aware training strategy.
MQAT guides a systematic gradated modular quantization sequence and determines module-specific bit precisions, leading to quantized models that outperform those produced by state-of-the-art uniform and mixed-precision quantization techniques.
arXiv Detail & Related papers (2023-03-12T21:01:54Z) - Fully Quantized Image Super-Resolution Networks [81.75002888152159]
We propose a Fully Quantized image Super-Resolution framework (FQSR) to jointly optimize efficiency and accuracy.
We apply our quantization scheme on multiple mainstream super-resolution architectures, including SRResNet, SRGAN and EDSR.
Our FQSR using low bits quantization can achieve on par performance compared with the full-precision counterparts on five benchmark datasets.
arXiv Detail & Related papers (2020-11-29T03:53:49Z) - End-to-End Quantum Machine Learning Implemented with Controlled Quantum
Dynamics [0.9599644507730106]
This work presents a hardware-friendly end-to-end quantum machine learning scheme that can be implemented with imperfect near-term intermediate-scale quantum (NISQ) processors.
The proposal transforms the machine learning task to the optimization of controlled quantum dynamics, in which the learning model is parameterized by experimentally tunable control variables.
Our design also enables automated feature selection by encoding the raw input to quantum states through agent control variables.
arXiv Detail & Related papers (2020-03-30T17:44:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.