FAT: Learning Low-Bitwidth Parametric Representation via Frequency-Aware
Transformation
- URL: http://arxiv.org/abs/2102.07444v1
- Date: Mon, 15 Feb 2021 10:35:20 GMT
- Title: FAT: Learning Low-Bitwidth Parametric Representation via Frequency-Aware
Transformation
- Authors: Chaofan Tao, Rui Lin, Quan Chen, Zhaoyang Zhang, Ping Luo, Ngai Wong
- Abstract summary: Frequency-Aware Transformation (FAT) learns to transform network weights in the frequency domain before quantization.
FAT can be easily trained in low precision using simple standard quantizers.
Code will be available soon.
- Score: 31.546529106932205
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning convolutional neural networks (CNNs) with low bitwidth is
challenging because performance may drop significantly after quantization.
Prior arts often discretize the network weights by carefully tuning
hyper-parameters of quantization (e.g. non-uniform stepsize and layer-wise
bitwidths), which are complicated and sub-optimal because the full-precision
and low-precision models have a large discrepancy. This work presents a novel
quantization pipeline, Frequency-Aware Transformation (FAT), which has several
appealing benefits. (1) Rather than designing complicated quantizers like
existing works, FAT learns to transform network weights in the frequency domain
before quantization, making them more amenable to training in low bitwidth. (2)
With FAT, CNNs can be easily trained in low precision using simple standard
quantizers without tedious hyper-parameter tuning. Theoretical analysis shows
that FAT improves both uniform and non-uniform quantizers. (3) FAT can be
easily plugged into many CNN architectures. When training ResNet-18 and
MobileNet-V2 in 4 bits, FAT plus a simple rounding operation already achieves
70.5% and 69.2% top-1 accuracy on ImageNet without bells and whistles,
outperforming recent state-of-the-art by reducing 54.9X and 45.7X computations
against full-precision models. Code will be available soon.
Related papers
- Post-Training Quantization for Re-parameterization via Coarse & Fine
Weight Splitting [13.270381125055275]
We propose a coarse & fine weight splitting (CFWS) method to reduce quantization error of weight.
We develop an improved KL metric to determine optimal quantization scales for activation.
For example, the quantized RepVGG-A1 model exhibits a mere 0.3% accuracy loss.
arXiv Detail & Related papers (2023-12-17T02:31:20Z) - DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural
Network Inference [28.912023025671868]
This work targets an adaptive data representation with variable-length encoding called DyBit.
We also propose a hardware-aware quantization framework with a mixed-precision accelerator to trade-off the inference accuracy and speedup.
Experimental results demonstrate that the inference accuracy via DyBit is 1.997% higher than the state-of-the-art at 4-bit quantization.
arXiv Detail & Related papers (2023-02-24T08:46:01Z) - Power-of-Two Quantization for Low Bitwidth and Hardware Compliant Neural
Networks [1.398698203665363]
In this paper, we explore non-linear quantization techniques for exploiting lower bit precision.
We developed the Quantization Aware Training (QAT) algorithm that allowed training of low bit width Power-of-Two (PoT) networks.
At the same time, PoT quantization vastly reduces the computational complexity of the neural network.
arXiv Detail & Related papers (2022-03-09T19:57:14Z) - DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z) - OMPQ: Orthogonal Mixed Precision Quantization [64.59700856607017]
Mixed precision quantization takes advantage of hardware's multiple bit-width arithmetic operations to unleash the full potential of network quantization.
We propose to optimize a proxy metric, the concept of networkity, which is highly correlated with the loss of the integer programming.
This approach reduces the search time and required data amount by orders of magnitude, with little compromise on quantization accuracy.
arXiv Detail & Related papers (2021-09-16T10:59:33Z) - Tied & Reduced RNN-T Decoder [0.0]
We study ways to make the RNN-T decoder (prediction network + joint network) smaller and faster without degradation in recognition performance.
Our prediction network performs a simple weighted averaging of the input embeddings, and shares its embedding matrix weights with the joint network's output layer.
This simple design, when used in conjunction with additional Edit-based Minimum Bayes Risk (EMBR) training, reduces the RNN-T Decoder from 23M parameters to just 2M, without affecting word-error rate (WER)
arXiv Detail & Related papers (2021-09-15T18:19:16Z) - Low-Precision Training in Logarithmic Number System using Multiplicative
Weight Update [49.948082497688404]
Training large-scale deep neural networks (DNNs) currently requires a significant amount of energy, leading to serious environmental impacts.
One promising approach to reduce the energy costs is representing DNNs with low-precision numbers.
We jointly design a lowprecision training framework involving a logarithmic number system (LNS) and a multiplicative weight update training method, termed LNS-Madam.
arXiv Detail & Related papers (2021-06-26T00:32:17Z) - Direct Quantization for Training Highly Accurate Low Bit-width Deep
Neural Networks [73.29587731448345]
This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations.
First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights.
Second, to obtain low bit-width activations, existing works consider all channels equally.
arXiv Detail & Related papers (2020-12-26T15:21:18Z) - Searching for Low-Bit Weights in Quantized Neural Networks [129.8319019563356]
Quantized neural networks with low-bit weights and activations are attractive for developing AI accelerators.
We present to regard the discrete weights in an arbitrary quantized neural network as searchable variables, and utilize a differential method to search them accurately.
arXiv Detail & Related papers (2020-09-18T09:13:26Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.