Related papers: Exploring the Potential of Low-bit Training of Convolutional Neural Networks

Exploring the Potential of Low-bit Training of Convolutional Neural Networks

URL: http://arxiv.org/abs/2006.02804v4
Date: Wed, 14 Jul 2021 05:54:48 GMT
Title: Exploring the Potential of Low-bit Training of Convolutional Neural Networks
Authors: Kai Zhong, Xuefei Ning, Guohao Dai, Zhenhua Zhu, Tianchen Zhao, Shulin Zeng, Yu Wang, Huazhong Yang
Abstract summary: We propose a low-bit training framework for convolutional neural networks. Our framework is built around a novel multi-level scaling (MLS) tensor format. Experiments show that our framework achieves a superior trade-off between the accuracy and the bit-width.
Score: 16.72709290595995
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we propose a low-bit training framework for convolutional neural networks, which is built around a novel multi-level scaling (MLS) tensor format. Our framework focuses on reducing the energy consumption of convolution operations by quantizing all the convolution operands to low bit-width format. Specifically, we propose the MLS tensor format, in which the element-wise bit-width can be largely reduced. Then, we describe the dynamic quantization and the low-bit tensor convolution arithmetic to leverage the MLS tensor format efficiently. Experiments show that our framework achieves a superior trade-off between the accuracy and the bit-width than previous low-bit training frameworks. For training a variety of models on CIFAR-10, using 1-bit mantissa and 2-bit exponent is adequate to keep the accuracy loss within $1\%$. And on larger datasets like ImageNet, using 4-bit mantissa and 2-bit exponent is adequate to keep the accuracy loss within $1\%$. Through the energy consumption simulation of the computing units, we can estimate that training a variety of models with our framework could achieve $8.3\sim10.2\times$ and $1.9\sim2.3\times$ higher energy efficiency than training with full-precision and 8-bit floating-point arithmetic, respectively.

Related papers

BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs [95.73339037243105]
BitNet v2 is a framework enabling native 4-bit activation quantization for 1-bit Large Language Models. H-BitLinear is a module applying an online Hadamard transformation prior to activation quantization. Experiments show BitNet v2 trained from scratch with 8-bit activations matches BitNet b1.58 performance.
arXiv Detail & Related papers (2025-04-25T15:17:52Z)
Towards Accurate and Efficient Sub-8-Bit Integer Training [24.853958178296587]
Quantization enables low-bitwidth formats in neural network training. Recent methods have developed new data formats and additional pre-processing operations on quantizers. It remains quite challenging to achieve high accuracy and efficiency simultaneously.
arXiv Detail & Related papers (2024-11-17T03:32:36Z)
Kronecker-Factored Approximate Curvature for Modern Neural Network Architectures [85.76673783330334]
Two different settings of linear weight-sharing layers motivate two flavours of Kronecker-Factored Approximate Curvature (K-FAC) We show they are exact for deep linear networks with weight-sharing in their respective setting. We observe little difference between these two K-FAC variations when using them to train both a graph neural network and a vision transformer.
arXiv Detail & Related papers (2023-11-01T16:37:00Z)
Quantized Neural Networks for Low-Precision Accumulation with Guaranteed Overflow Avoidance [68.8204255655161]
We introduce a quantization-aware training algorithm that guarantees avoiding numerical overflow when reducing the precision of accumulators during inference. We evaluate our algorithm across multiple quantized models that we train for different tasks, showing that our approach can reduce the precision of accumulators while maintaining model accuracy with respect to a floating-point baseline.
arXiv Detail & Related papers (2023-01-31T02:46:57Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training [68.63354877166756]
ActNN is a memory-efficient training framework that stores randomly quantized activations for back propagation. ActNN reduces the memory footprint of the activation by 12x, and it enables training with a 6.6x to 14x larger batch size.
arXiv Detail & Related papers (2021-04-29T05:50:54Z)
VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference [7.886868529510128]
Quantization maps floating-point weights and activations in a trained model to low-bitwidth integer values using scale factors. Excessive quantization, reducing precision too aggressively, results in accuracy degradation. Per-vector scale factors can be implemented with low-bitwidth integers when using a two-level quantization scheme.
arXiv Detail & Related papers (2021-02-08T19:56:04Z)
Activation Density based Mixed-Precision Quantization for Energy Efficient Neural Networks [2.666640112616559]
We propose an in-training quantization method for neural network models. Our method calculates bit-width for each layer during training a mixed precision model with competitive accuracy. We run experiments on benchmark datasets like CIFAR-10, CIFAR-100, TinyImagenet on VGG19/ResNet18 architectures.
arXiv Detail & Related papers (2021-01-12T09:01:44Z)
FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients. FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z)
Towards Compact Neural Networks via End-to-End Training: A Bayesian Tensor Approach with Automatic Rank Determination [11.173092834726528]
It is desirable to directly train a compact neural network from scratch with low memory and low computational cost. Low-rank tensor decomposition is one of the most effective approaches to reduce the memory and computing requirements of large-size neural networks. This paper presents a novel end-to-end framework for low-rank tensorized training of neural networks.
arXiv Detail & Related papers (2020-10-17T01:23:26Z)
Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization. Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z)
Shifted and Squeezed 8-bit Floating Point format for Low-Precision Training of Deep Neural Networks [13.929168096016957]
We introduce a novel methodology for training deep neural networks using 8-bit floating point (FP8) numbers. Reduced bit precision allows for a larger effective memory and increased computational speed. We show that, unlike previous 8-bit precision training methods, the proposed method works out-of-the-box for representative models.
arXiv Detail & Related papers (2020-01-16T06:38:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.