QGen: On the Ability to Generalize in Quantization Aware Training
- URL: http://arxiv.org/abs/2404.11769v2
- Date: Fri, 19 Apr 2024 16:50:05 GMT
- Title: QGen: On the Ability to Generalize in Quantization Aware Training
- Authors: MohammadHossein AskariHemmat, Ahmadreza Jeddi, Reyhane Askari Hemmat, Ivan Lazarevich, Alexander Hoffman, Sudhakar Sah, Ehsan Saboori, Yvon Savaria, Jean-Pierre David,
- Abstract summary: Quantization lowers memory usage, computational requirements, and latency by utilizing fewer bits to represent model weights and activations.
We develop a theoretical model for quantization in neural networks and demonstrate how quantization functions as a form of regularization.
- Score: 35.0485699853394
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Quantization lowers memory usage, computational requirements, and latency by utilizing fewer bits to represent model weights and activations. In this work, we investigate the generalization properties of quantized neural networks, a characteristic that has received little attention despite its implications on model performance. In particular, first, we develop a theoretical model for quantization in neural networks and demonstrate how quantization functions as a form of regularization. Second, motivated by recent work connecting the sharpness of the loss landscape and generalization, we derive an approximate bound for the generalization of quantized models conditioned on the amount of quantization noise. We then validate our hypothesis by experimenting with over 2000 models trained on CIFAR-10, CIFAR-100, and ImageNet datasets on convolutional and transformer-based models.
Related papers
- Quantum machine learning for image classification [39.58317527488534]
This research introduces two quantum machine learning models that leverage the principles of quantum mechanics for effective computations.
Our first model, a hybrid quantum neural network with parallel quantum circuits, enables the execution of computations even in the noisy intermediate-scale quantum era.
A second model introduces a hybrid quantum neural network with a Quanvolutional layer, reducing image resolution via a convolution process.
arXiv Detail & Related papers (2023-04-18T18:23:20Z) - A Framework for Demonstrating Practical Quantum Advantage: Racing
Quantum against Classical Generative Models [62.997667081978825]
We build over a proposed framework for evaluating the generalization performance of generative models.
We establish the first comparative race towards practical quantum advantage (PQA) between classical and quantum generative models.
Our results suggest that QCBMs are more efficient in the data-limited regime than the other state-of-the-art classical generative models.
arXiv Detail & Related papers (2023-03-27T22:48:28Z) - Vertical Layering of Quantized Neural Networks for Heterogeneous
Inference [57.42762335081385]
We study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one.
We can theoretically achieve any precision network for on-demand service while only needing to train and maintain one model.
arXiv Detail & Related papers (2022-12-10T15:57:38Z) - A didactic approach to quantum machine learning with a single qubit [68.8204255655161]
We focus on the case of learning with a single qubit, using data re-uploading techniques.
We implement the different proposed formulations in toy and real-world datasets using the qiskit quantum computing SDK.
arXiv Detail & Related papers (2022-11-23T18:25:32Z) - Generalization despite overfitting in quantum machine learning models [0.0]
We provide a characterization of benign overfitting in quantum models.
We show how a class of quantum models exhibits analogous features.
We intuitively explain these features according to the ability of the quantum model to interpolate noisy data with locally "spiky" behavior.
arXiv Detail & Related papers (2022-09-12T18:08:45Z) - Mixed-Precision Inference Quantization: Radically Towards Faster
inference speed, Lower Storage requirement, and Lower Loss [4.877532217193618]
Existing quantization techniques rely heavily on experience and "fine-tuning" skills.
This study provides a methodology for acquiring a mixed-precise quantization model with a lower loss than the full precision model.
In particular, we will demonstrate that neural networks with massive identity mappings are resistant to the quantization method.
arXiv Detail & Related papers (2022-07-20T10:55:34Z) - BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation.
Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration.
This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z) - Generalization Metrics for Practical Quantum Advantage in Generative
Models [68.8204255655161]
Generative modeling is a widely accepted natural use case for quantum computers.
We construct a simple and unambiguous approach to probe practical quantum advantage for generative modeling by measuring the algorithm's generalization performance.
Our simulation results show that our quantum-inspired models have up to a $68 times$ enhancement in generating unseen unique and valid samples.
arXiv Detail & Related papers (2022-01-21T16:35:35Z) - Post-Training Quantization for Vision Transformer [85.57953732941101]
We present an effective post-training quantization algorithm for reducing the memory storage and computational costs of vision transformers.
We can obtain an 81.29% top-1 accuracy using DeiT-B model on ImageNet dataset with about 8-bit quantization.
arXiv Detail & Related papers (2021-06-27T06:27:22Z) - SQWA: Stochastic Quantized Weight Averaging for Improving the
Generalization Capability of Low-Precision Deep Neural Networks [29.187848543158992]
We present a new quantized neural network optimization approach, quantized weight averaging (SQWA)
The proposed approach includes floating-point model training, direct quantization of weights, capturing multiple low-precision models, averaging the captured models, and fine-tuning it with low-learning rates.
With SQWA training, we achieved state-of-the-art results for 2-bit QDNNs on CIFAR-100 and ImageNet datasets.
arXiv Detail & Related papers (2020-02-02T07:02:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.