A probabilistic framework for dynamic quantization
- URL: http://arxiv.org/abs/2505.10689v1
- Date: Thu, 15 May 2025 20:26:46 GMT
- Title: A probabilistic framework for dynamic quantization
- Authors: Gabriele Santini, Francesco Paissan, Elisabetta Farella,
- Abstract summary: Our framework applies a probabilistic model to the network's pre-activations through a lightweight surrogate.<n>We validate our approach on a set of popular computer vision tasks and models, observing only a negligible loss in performance.
- Score: 2.635385344076311
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a probabilistic framework for dynamic quantization of neural networks that allows for a computationally efficient input-adaptive rescaling of the quantization parameters. Our framework applies a probabilistic model to the network's pre-activations through a lightweight surrogate, enabling the adaptive adjustment of the quantization parameters on a per-input basis without significant memory overhead. We validate our approach on a set of popular computer vision tasks and models, observing only a negligible loss in performance. Our method strikes the best performance and computational overhead tradeoff compared to standard quantization strategies.
Related papers
- TensoMeta-VQC: A Tensor-Train-Guided Meta-Learning Framework for Robust and Scalable Variational Quantum Computing [60.996803677584424]
TensoMeta-VQC is a novel tensor-train (TT)-guided meta-learning framework designed to improve the robustness and scalability of VQC significantly.<n>Our framework fully delegates the generation of quantum circuit parameters to a classical TT network, effectively decoupling optimization from quantum hardware.
arXiv Detail & Related papers (2025-08-01T23:37:55Z) - Optimizing Deep Neural Networks using Safety-Guided Self Compression [0.0]
This study introduces a novel safety-driven quantization framework that prunes and quantizes neural network weights.<n>The proposed methodology is rigorously evaluated on both a convolutional neural network (CNN) and an attention-based language model.<n> Experimental results reveal that our framework achieves up to a 2.5% enhancement in test accuracy relative to the original unquantized models.
arXiv Detail & Related papers (2025-05-01T06:50:30Z) - Precision Neural Network Quantization via Learnable Adaptive Modules [27.323901068182234]
Quantization Aware Training (QAT) is a neural network quantization technique that compresses model size and improves operational efficiency.<n>We propose an effective learnable adaptive neural network quantization method, called Adaptive Step Size Quantization (ASQ)
arXiv Detail & Related papers (2025-04-24T05:46:25Z) - Communication-Efficient Federated Learning via Clipped Uniform Quantization [3.38220960870904]
This paper presents a novel approach to enhance communication efficiency in federated learning through clipped uniform quantization.<n>By leveraging optimal clipping thresholds and client-specific adaptive quantization schemes, the proposed method significantly reduces bandwidth and memory requirements for model weight transmission between clients and the server.<n>In contrast to federated averaging, this design obviates the need to disclose client-specific data volumes to the server, thereby enhancing client privacy.
arXiv Detail & Related papers (2024-05-22T05:48:25Z) - Model-based Offline Quantum Reinforcement Learning [4.912318087940015]
This paper presents the first algorithm for model-based offline quantum reinforcement learning.
It gives hope that a quantum advantage can be achieved as soon as sufficiently powerful quantum computers are available.
arXiv Detail & Related papers (2024-04-14T15:11:27Z) - QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning [52.157939524815866]
In this paper, we identify imbalanced activation distributions as a primary source of quantization difficulty.<n>We propose to adjust these distributions through weight finetuning to be more quantization-friendly.<n>Our method demonstrates its efficacy across three high-resolution image generation tasks.
arXiv Detail & Related papers (2024-02-06T03:39:44Z) - Function-Space Regularization in Neural Networks: A Probabilistic
Perspective [51.133793272222874]
We show that we can derive a well-motivated regularization technique that allows explicitly encoding information about desired predictive functions into neural network training.
We evaluate the utility of this regularization technique empirically and demonstrate that the proposed method leads to near-perfect semantic shift detection and highly-calibrated predictive uncertainty estimates.
arXiv Detail & Related papers (2023-12-28T17:50:56Z) - Quantization Aware Factorization for Deep Neural Network Compression [20.04951101799232]
decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks.
A conventional post-training quantization approach applied to networks with weights yields a drop in accuracy.
This motivated us to develop an algorithm that finds decomposed approximation directly with quantized factors.
arXiv Detail & Related papers (2023-08-08T21:38:02Z) - PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language
Models [52.09865918265002]
We propose a novel quantize before fine-tuning'' framework, PreQuant.
PreQuant is compatible with various quantization strategies, with outlier-aware fine-tuning incorporated to correct the induced quantization error.
We demonstrate the effectiveness of PreQuant on the GLUE benchmark using BERT, RoBERTa, and T5.
arXiv Detail & Related papers (2023-05-30T08:41:33Z) - Vertical Layering of Quantized Neural Networks for Heterogeneous
Inference [57.42762335081385]
We study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one.
We can theoretically achieve any precision network for on-demand service while only needing to train and maintain one model.
arXiv Detail & Related papers (2022-12-10T15:57:38Z) - Generalization Metrics for Practical Quantum Advantage in Generative
Models [68.8204255655161]
Generative modeling is a widely accepted natural use case for quantum computers.
We construct a simple and unambiguous approach to probe practical quantum advantage for generative modeling by measuring the algorithm's generalization performance.
Our simulation results show that our quantum-inspired models have up to a $68 times$ enhancement in generating unseen unique and valid samples.
arXiv Detail & Related papers (2022-01-21T16:35:35Z) - Post-Training Quantization for Vision Transformer [85.57953732941101]
We present an effective post-training quantization algorithm for reducing the memory storage and computational costs of vision transformers.
We can obtain an 81.29% top-1 accuracy using DeiT-B model on ImageNet dataset with about 8-bit quantization.
arXiv Detail & Related papers (2021-06-27T06:27:22Z) - Gradient $\ell_1$ Regularization for Quantization Robustness [70.39776106458858]
We derive a simple regularization scheme that improves robustness against post-training quantization.
By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths.
arXiv Detail & Related papers (2020-02-18T12:31:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.