GranQ: Granular Zero-Shot Quantization with Unified Layer-Channel Awareness
- URL: http://arxiv.org/abs/2503.18339v3
- Date: Mon, 28 Apr 2025 08:30:11 GMT
- Title: GranQ: Granular Zero-Shot Quantization with Unified Layer-Channel Awareness
- Authors: Inpyo Hong, Youngwan Jo, Hyojeong Lee, Sunghyun Ahn, Sanghyun Park,
- Abstract summary: GranQ is a novel ZSQ approach that leverages layer-channel awareness to minimize the quantization error.<n>GranQ achieves superior performance compared with those of state-of-the-art ZSQ methods that employ quantization-aware training.
- Score: 1.8067835669244101
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Zero-shot quantization (ZSQ) enables neural network compression without training data, which is crucial in restricted data access environments. However, existing ZSQ methods suffer from significant activation loss in low-bit environments owing to their coarse-grained scaling strategy. To address this issue, we propose GranQ, a novel ZSQ approach that leverages layer-channel awareness to minimize the quantization error. Unlike conventional layer- or channel-wise quantization, GranQ dynamically adjusts quantization granularity by considering both layer- and channel-level activation distributions. This enables fine-grained quantization while minimizing activation distortion. Additionally, we introduce vectorized activation quantization, which enables efficient parallel computation and reduces computational overhead while preserving accuracy. GranQ achieves superior performance compared with those of state-of-the-art ZSQ methods that employ quantization-aware training. With these findings, we anticipate that GranQ will inspire novel research directions beyond conventional ZSQ approaches focused on data generation and model training.
Related papers
- Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization [0.0]
Post-training quantization has emerged as a widely used technique for compressing large language models (LLMs) without retraining.
The accumulation of quantization errors across layers significantly degrades performance, particularly in low-bit regimes.
We propose Quantization Error propagation (QEP), a lightweight and general framework that enhances layer-wise PTQ by explicitly propagating the quantization error.
arXiv Detail & Related papers (2025-04-13T15:56:00Z) - Leveraging Pre-Trained Neural Networks to Enhance Machine Learning with Variational Quantum Circuits [48.33631905972908]
We introduce an innovative approach that utilizes pre-trained neural networks to enhance Variational Quantum Circuits (VQC)
This technique effectively separates approximation error from qubit count and removes the need for restrictive conditions.
Our results extend to applications such as human genome analysis, demonstrating the broad applicability of our approach.
arXiv Detail & Related papers (2024-11-13T12:03:39Z) - QT-DoG: Quantization-aware Training for Domain Generalization [58.439816306817306]
We propose Quantization-aware Training for Domain Generalization (QT-DoG)
QT-DoG exploits quantization as an implicit regularizer by inducing noise in model weights.
We demonstrate that QT-DoG generalizes across various datasets, architectures, and quantization algorithms.
arXiv Detail & Related papers (2024-10-08T13:21:48Z) - Constraint Guided Model Quantization of Neural Networks [0.0]
Constraint Guided Model Quantization (CGMQ) is a quantization aware training algorithm that uses an upper bound on the computational resources and reduces the bit-widths of the parameters of the neural network.
It is shown on MNIST that the performance of CGMQ is competitive with state-of-the-art quantization aware training algorithms.
arXiv Detail & Related papers (2024-09-30T09:41:16Z) - Quantum-Train: Rethinking Hybrid Quantum-Classical Machine Learning in the Model Compression Perspective [7.7063925534143705]
We introduce the Quantum-Train(QT) framework, a novel approach that integrates quantum computing with machine learning algorithms.
QT achieves remarkable results by employing a quantum neural network alongside a classical mapping model.
arXiv Detail & Related papers (2024-05-18T14:35:57Z) - Gradient-based Automatic Mixed Precision Quantization for Neural Networks On-Chip [0.9187138676564589]
We present High Granularity Quantization (HGQ), an innovative quantization-aware training method.
HGQ fine-tune the per-weight and per-activation precision by making them optimizable through gradient descent.
This approach enables ultra-low latency and low power neural networks on hardware capable of performing arithmetic operations.
arXiv Detail & Related papers (2024-05-01T17:18:46Z) - Calibrating the role of entanglement in variational quantum circuits [0.6435156676256051]
Entanglement is a key property of quantum computing that separates it from its classical counterpart.
We systematically probe the role of entanglement in the working of two variational quantum algorithms.
We find that for the MAX-CUT problem solved using QAOA, the fidelity as a function of entanglement is highly dependent on the number of layers.
In the case of QNNs, trained circuits with high test accuracies are underpinned by higher entanglement, with any enforced limitation in entanglement resulting in a sharp decline in test accuracy.
arXiv Detail & Related papers (2023-10-16T23:36:40Z) - Scaling Limits of Quantum Repeater Networks [62.75241407271626]
Quantum networks (QNs) are a promising platform for secure communications, enhanced sensing, and efficient distributed quantum computing.
Due to the fragile nature of quantum states, these networks face significant challenges in terms of scalability.
In this paper, the scaling limits of quantum repeater networks (QRNs) are analyzed.
arXiv Detail & Related papers (2023-05-15T14:57:01Z) - Accelerating the training of single-layer binary neural networks using
the HHL quantum algorithm [58.720142291102135]
We show that useful information can be extracted from the quantum-mechanical implementation of Harrow-Hassidim-Lloyd (HHL)
This paper shows, however, that useful information can be extracted from the quantum-mechanical implementation of HHL, and used to reduce the complexity of finding the solution on the classical side.
arXiv Detail & Related papers (2022-10-23T11:58:05Z) - Synergy Between Quantum Circuits and Tensor Networks: Short-cutting the
Race to Practical Quantum Advantage [43.3054117987806]
We introduce a scalable procedure for harnessing classical computing resources to provide pre-optimized initializations for quantum circuits.
We show this method significantly improves the trainability and performance of PQCs on a variety of problems.
By demonstrating a means of boosting limited quantum resources using classical computers, our approach illustrates the promise of this synergy between quantum and quantum-inspired models in quantum computing.
arXiv Detail & Related papers (2022-08-29T15:24:03Z) - Optimizing Tensor Network Contraction Using Reinforcement Learning [86.05566365115729]
We propose a Reinforcement Learning (RL) approach combined with Graph Neural Networks (GNN) to address the contraction ordering problem.
The problem is extremely challenging due to the huge search space, the heavy-tailed reward distribution, and the challenging credit assignment.
We show how a carefully implemented RL-agent that uses a GNN as the basic policy construct can address these challenges.
arXiv Detail & Related papers (2022-04-18T21:45:13Z) - DAQ: Distribution-Aware Quantization for Deep Image Super-Resolution
Networks [49.191062785007006]
Quantizing deep convolutional neural networks for image super-resolution substantially reduces their computational costs.
Existing works either suffer from a severe performance drop in ultra-low precision of 4 or lower bit-widths, or require a heavy fine-tuning process to recover the performance.
We propose a novel distribution-aware quantization scheme (DAQ) which facilitates accurate training-free quantization in ultra-low precision.
arXiv Detail & Related papers (2020-12-21T10:19:42Z) - A Statistical Framework for Low-bitwidth Training of Deep Neural
Networks [70.77754244060384]
Fully quantized training (FQT) uses low-bitwidth hardware by quantizing the activations, weights, and gradients of a neural network model.
One major challenge with FQT is the lack of theoretical understanding, in particular of how gradient quantization impacts convergence properties.
arXiv Detail & Related papers (2020-10-27T13:57:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.