Exploring the Potential of Low-bit Training of Convolutional Neural
Networks
- URL: http://arxiv.org/abs/2006.02804v4
- Date: Wed, 14 Jul 2021 05:54:48 GMT
- Title: Exploring the Potential of Low-bit Training of Convolutional Neural
Networks
- Authors: Kai Zhong, Xuefei Ning, Guohao Dai, Zhenhua Zhu, Tianchen Zhao, Shulin
Zeng, Yu Wang, Huazhong Yang
- Abstract summary: We propose a low-bit training framework for convolutional neural networks.
Our framework is built around a novel multi-level scaling (MLS) tensor format.
Experiments show that our framework achieves a superior trade-off between the accuracy and the bit-width.
- Score: 16.72709290595995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we propose a low-bit training framework for convolutional
neural networks, which is built around a novel multi-level scaling (MLS) tensor
format. Our framework focuses on reducing the energy consumption of convolution
operations by quantizing all the convolution operands to low bit-width format.
Specifically, we propose the MLS tensor format, in which the element-wise
bit-width can be largely reduced. Then, we describe the dynamic quantization
and the low-bit tensor convolution arithmetic to leverage the MLS tensor format
efficiently. Experiments show that our framework achieves a superior trade-off
between the accuracy and the bit-width than previous low-bit training
frameworks. For training a variety of models on CIFAR-10, using 1-bit mantissa
and 2-bit exponent is adequate to keep the accuracy loss within $1\%$. And on
larger datasets like ImageNet, using 4-bit mantissa and 2-bit exponent is
adequate to keep the accuracy loss within $1\%$. Through the energy consumption
simulation of the computing units, we can estimate that training a variety of
models with our framework could achieve $8.3\sim10.2\times$ and
$1.9\sim2.3\times$ higher energy efficiency than training with full-precision
and 8-bit floating-point arithmetic, respectively.
Related papers
- Towards Accurate and Efficient Sub-8-Bit Integer Training [24.853958178296587]
Quantization enables low-bitwidth formats in neural network training.
Recent methods have developed new data formats and additional pre-processing operations on quantizers.
It remains quite challenging to achieve high accuracy and efficiency simultaneously.
arXiv Detail & Related papers (2024-11-17T03:32:36Z) - Kronecker-Factored Approximate Curvature for Modern Neural Network
Architectures [85.76673783330334]
Two different settings of linear weight-sharing layers motivate two flavours of Kronecker-Factored Approximate Curvature (K-FAC)
We show they are exact for deep linear networks with weight-sharing in their respective setting.
We observe little difference between these two K-FAC variations when using them to train both a graph neural network and a vision transformer.
arXiv Detail & Related papers (2023-11-01T16:37:00Z) - Quantized Neural Networks for Low-Precision Accumulation with Guaranteed
Overflow Avoidance [68.8204255655161]
We introduce a quantization-aware training algorithm that guarantees avoiding numerical overflow when reducing the precision of accumulators during inference.
We evaluate our algorithm across multiple quantized models that we train for different tasks, showing that our approach can reduce the precision of accumulators while maintaining model accuracy with respect to a floating-point baseline.
arXiv Detail & Related papers (2023-01-31T02:46:57Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - ActNN: Reducing Training Memory Footprint via 2-Bit Activation
Compressed Training [68.63354877166756]
ActNN is a memory-efficient training framework that stores randomly quantized activations for back propagation.
ActNN reduces the memory footprint of the activation by 12x, and it enables training with a 6.6x to 14x larger batch size.
arXiv Detail & Related papers (2021-04-29T05:50:54Z) - VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision
Neural Network Inference [7.886868529510128]
Quantization maps floating-point weights and activations in a trained model to low-bitwidth integer values using scale factors.
Excessive quantization, reducing precision too aggressively, results in accuracy degradation.
Per-vector scale factors can be implemented with low-bitwidth integers when using a two-level quantization scheme.
arXiv Detail & Related papers (2021-02-08T19:56:04Z) - Activation Density based Mixed-Precision Quantization for Energy
Efficient Neural Networks [2.666640112616559]
We propose an in-training quantization method for neural network models.
Our method calculates bit-width for each layer during training a mixed precision model with competitive accuracy.
We run experiments on benchmark datasets like CIFAR-10, CIFAR-100, TinyImagenet on VGG19/ResNet18 architectures.
arXiv Detail & Related papers (2021-01-12T09:01:44Z) - FracTrain: Fractionally Squeezing Bit Savings Both Temporally and
Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients.
FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z) - Towards Compact Neural Networks via End-to-End Training: A Bayesian
Tensor Approach with Automatic Rank Determination [11.173092834726528]
It is desirable to directly train a compact neural network from scratch with low memory and low computational cost.
Low-rank tensor decomposition is one of the most effective approaches to reduce the memory and computing requirements of large-size neural networks.
This paper presents a novel end-to-end framework for low-rank tensorized training of neural networks.
arXiv Detail & Related papers (2020-10-17T01:23:26Z) - Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization.
Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z) - Shifted and Squeezed 8-bit Floating Point format for Low-Precision
Training of Deep Neural Networks [13.929168096016957]
We introduce a novel methodology for training deep neural networks using 8-bit floating point (FP8) numbers.
Reduced bit precision allows for a larger effective memory and increased computational speed.
We show that, unlike previous 8-bit precision training methods, the proposed method works out-of-the-box for representative models.
arXiv Detail & Related papers (2020-01-16T06:38:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.