Related papers: A Study of Quantisation-aware Training on Time Series Transformer Models for Resource-constrained FPGAs

Related papers

FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation [55.12070409045766]
Post-training quantization (PTQ) has stood out as a cost-effective and promising model compression paradigm in recent years.<n>Current PTQ methods for Vision Transformers (ViTs) still suffer from significant accuracy degradation, especially under low-bit quantization.
arXiv Detail & Related papers (2025-06-13T07:57:38Z)
Quantum Reservoir Computing for Realized Volatility Forecasting [0.6249768559720121]
Quantum reservoir computing combines quantum computation with machine learning for modeling nonlinear temporal dependencies.<n>In this work, we investigate the application of quantum reservoir computing for realized volatility forecasting.<n>Our results indicate that the proposed quantum reservoir approach consistently outperforms benchmark models across various metrics.
arXiv Detail & Related papers (2025-05-20T05:02:13Z)
Deep Unfolding with Kernel-based Quantization in MIMO Detection [26.033613526407226]
This paper proposes a novel kernel-based adaptive quantization (KAQ) framework for deep unfolding networks.<n>The accuracy of proposed KAQ framework outperforms traditional methods and successfully reduces the model's inference latency.
arXiv Detail & Related papers (2025-05-19T05:50:24Z)
Estimation of Quantum Fisher Information via Stein's Identity in Variational Quantum Algorithms [0.0]
The Quantum Fisher Information Matrix (QFIM) plays a crucial role in quantum optimization algorithms.<n> computing the full QFIM incurs a quadratic computational cost of O(d2) with respect to the number of parameters d.<n>We propose an alternative estimation framework based on Stein's identity that also achieves constant computational complexity.
arXiv Detail & Related papers (2025-02-24T15:10:36Z)
TCAQ-DM: Timestep-Channel Adaptive Quantization for Diffusion Models [49.65286242048452]
We propose a novel method dubbed Timestep-Channel Adaptive Quantization for Diffusion Models (TCAQ-DM) The proposed method substantially outperforms the state-of-the-art approaches in most cases.
arXiv Detail & Related papers (2024-12-21T16:57:54Z)
Scalable quantum dynamics compilation via quantum machine learning [7.31922231703204]
variational quantum compilation (VQC) methods employ variational optimization to reduce gate costs while maintaining high accuracy. We show that our approach exceeds state-of-the-art compilation results in both system size and accuracy in one dimension ($1$D) For the first time, we extend VQC to systems on two-dimensional (2D) strips with a quasi-1D treatment, demonstrating a significant resource advantage over standard Trotterization methods.
arXiv Detail & Related papers (2024-09-24T18:00:00Z)
Adaptive quantization with mixed-precision based on low-cost proxy [8.527626602939105]
This paper proposes a novel model quantization method, named the Low-Cost Proxy-Based Adaptive Mixed-Precision Model Quantization (LCPAQ) The hardware-aware module is designed by considering the hardware limitations, while an adaptive mixed-precision quantization module is developed to evaluate the quantization sensitivity. Experiments on the ImageNet demonstrate that the proposed LCPAQ achieves comparable or superior quantization accuracy to existing mixed-precision models.
arXiv Detail & Related papers (2024-02-27T17:36:01Z)
Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers [10.566264033360282]
Post-training quantization (PTQ) has emerged as a promising solution for deploying hyper-scale models on edge devices such as mobile and TVs. In this paper, we propose a novel PTQ algorithm that balances accuracy and efficiency.
arXiv Detail & Related papers (2024-02-14T05:58:43Z)
MRQ:Support Multiple Quantization Schemes through Model Re-Quantization [0.17499351967216337]
Deep learning models cannot be easily quantized for diverse fixed-point hardwares. New type of model quantization approach called model re-quantization is proposed. Models obtained from the re-quantization process have been successfully deployed on NNA in the Echo Show devices.
arXiv Detail & Related papers (2023-08-01T08:15:30Z)
Temporal Dynamic Quantization for Diffusion Models [18.184163233551292]
We introduce a novel quantization method that dynamically adjusts the quantization interval based on time step information. Unlike conventional dynamic quantization techniques, our approach has no computational overhead during inference. Our experiments demonstrate substantial improvements in output quality with the quantized diffusion model across various datasets.
arXiv Detail & Related papers (2023-06-04T09:49:43Z)
PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models [52.09865918265002]
We propose a novel quantize before fine-tuning'' framework, PreQuant. PreQuant is compatible with various quantization strategies, with outlier-aware fine-tuning incorporated to correct the induced quantization error. We demonstrate the effectiveness of PreQuant on the GLUE benchmark using BERT, RoBERTa, and T5.
arXiv Detail & Related papers (2023-05-30T08:41:33Z)
Modular Quantization-Aware Training for 6D Object Pose Estimation [52.9436648014338]
Edge applications demand efficient 6D object pose estimation on resource-constrained embedded platforms. We introduce Modular Quantization-Aware Training (MQAT), an adaptive and mixed-precision quantization-aware training strategy. MQAT guides a systematic gradated modular quantization sequence and determines module-specific bit precisions, leading to quantized models that outperform those produced by state-of-the-art uniform and mixed-precision quantization techniques.
arXiv Detail & Related papers (2023-03-12T21:01:54Z)
MQBench: Towards Reproducible and Deployable Model Quantization Benchmark [53.12623958951738]
MQBench is a first attempt to evaluate, analyze, and benchmark the and deployability for model quantization algorithms. We choose multiple platforms for real-world deployments, including CPU, GPU, ASIC, DSP, and evaluate extensive state-of-the-art quantization algorithms. We conduct a comprehensive analysis and find considerable intuitive or counter-intuitive insights.
arXiv Detail & Related papers (2021-11-05T23:38:44Z)
Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks. DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons. We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z)
Post-Training Quantization for Vision Transformer [85.57953732941101]
We present an effective post-training quantization algorithm for reducing the memory storage and computational costs of vision transformers. We can obtain an 81.29% top-1 accuracy using DeiT-B model on ImageNet dataset with about 8-bit quantization.
arXiv Detail & Related papers (2021-06-27T06:27:22Z)
Gradient $\ell_1$ Regularization for Quantization Robustness [70.39776106458858]
We derive a simple regularization scheme that improves robustness against post-training quantization. By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths.
arXiv Detail & Related papers (2020-02-18T12:31:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.