Quantized Feature Distillation for Network Quantization
- URL: http://arxiv.org/abs/2307.10638v1
- Date: Thu, 20 Jul 2023 07:08:24 GMT
- Title: Quantized Feature Distillation for Network Quantization
- Authors: Ke Zhu and Yin-Yin He and Jianxin Wu
- Abstract summary: Methods adopting the quantization aware training (QAT) paradigm have recently seen a rapid growth, but are often conceptually complicated.
This paper proposes a novel and highly effective QAT method, quantized feature distillation (QFD)
QFD first trains a quantized (or binarized) representation as the teacher, then quantize the network using knowledge distillation (KD)
- Score: 32.26577845735846
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural network quantization aims to accelerate and trim full-precision neural
network models by using low bit approximations. Methods adopting the
quantization aware training (QAT) paradigm have recently seen a rapid growth,
but are often conceptually complicated. This paper proposes a novel and highly
effective QAT method, quantized feature distillation (QFD). QFD first trains a
quantized (or binarized) representation as the teacher, then quantize the
network using knowledge distillation (KD). Quantitative results show that QFD
is more flexible and effective (i.e., quantization friendly) than previous
quantization methods. QFD surpasses existing methods by a noticeable margin on
not only image classification but also object detection, albeit being much
simpler. Furthermore, QFD quantizes ViT and Swin-Transformer on MS-COCO
detection and segmentation, which verifies its potential in real world
deployment. To the best of our knowledge, this is the first time that vision
transformers have been quantized in object detection and image segmentation
tasks.
Related papers
- Quantization-aware Interval Bound Propagation for Training Certifiably
Robust Quantized Neural Networks [58.195261590442406]
We study the problem of training and certifying adversarially robust quantized neural networks (QNNs)
Recent work has shown that floating-point neural networks that have been verified to be robust can become vulnerable to adversarial attacks after quantization.
We present quantization-aware interval bound propagation (QA-IBP), a novel method for training robust QNNs.
arXiv Detail & Related papers (2022-11-29T13:32:38Z) - Green, Quantized Federated Learning over Wireless Networks: An
Energy-Efficient Design [68.86220939532373]
The finite precision level is captured through the use of quantized neural networks (QNNs) that quantize weights and activations in fixed-precision format.
The proposed FL framework can reduce energy consumption until convergence by up to 70% compared to a baseline FL algorithm.
arXiv Detail & Related papers (2022-07-19T16:37:24Z) - Learning Representations for CSI Adaptive Quantization and Feedback [51.14360605938647]
We propose an efficient method for adaptive quantization and feedback in frequency division duplexing systems.
Existing works mainly focus on the implementation of autoencoder (AE) neural networks for CSI compression.
We recommend two different methods: one based on a post training quantization and the second one in which the codebook is found during the training of the AE.
arXiv Detail & Related papers (2022-07-13T08:52:13Z) - A Comprehensive Survey on Model Quantization for Deep Neural Networks in
Image Classification [0.0]
A promising approach is quantization, in which the full-precision values are stored in low bit-width precision.
We present a comprehensive survey of quantization concepts and methods, with a focus on image classification.
We explain the replacement of floating-point operations with low-cost bitwise operations in a quantized DNN and the sensitivity of different layers in quantization.
arXiv Detail & Related papers (2022-05-14T15:08:32Z) - Toward Physically Realizable Quantum Neural Networks [15.018259942339446]
Current solutions for quantum neural networks (QNNs) pose challenges concerning their scalability.
The exponential state space of QNNs poses challenges for the scalability of training procedures.
This paper presents a new model for QNNs that relies on band-limited Fourier expansions of transfer functions of quantum perceptrons.
arXiv Detail & Related papers (2022-03-22T23:03:32Z) - Post-training Quantization for Neural Networks with Provable Guarantees [9.58246628652846]
We modify a post-training neural-network quantization method, GPFQ, that is based on a greedy path-following mechanism.
We prove that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights.
arXiv Detail & Related papers (2022-01-26T18:47:38Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - Post-Training Quantization for Vision Transformer [85.57953732941101]
We present an effective post-training quantization algorithm for reducing the memory storage and computational costs of vision transformers.
We can obtain an 81.29% top-1 accuracy using DeiT-B model on ImageNet dataset with about 8-bit quantization.
arXiv Detail & Related papers (2021-06-27T06:27:22Z) - A Statistical Framework for Low-bitwidth Training of Deep Neural
Networks [70.77754244060384]
Fully quantized training (FQT) uses low-bitwidth hardware by quantizing the activations, weights, and gradients of a neural network model.
One major challenge with FQT is the lack of theoretical understanding, in particular of how gradient quantization impacts convergence properties.
arXiv Detail & Related papers (2020-10-27T13:57:33Z) - Optimal Quantization for Batch Normalization in Neural Network
Deployments and Beyond [18.14282813812512]
Batch Normalization (BN) poses a challenge for Quantized Neural Networks (QNNs)
We propose a novel method to quantize BN by converting an affine transformation of two floating points to a fixed-point operation with shared quantized scale.
Our method is verified by experiments at layer level on CIFAR and ImageNet datasets.
arXiv Detail & Related papers (2020-08-30T09:33:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.