Related papers: Ultra-low Precision Multiplication-free Training for Deep Neural Networks

Ultra-low Precision Multiplication-free Training for Deep Neural Networks

URL: http://arxiv.org/abs/2302.14458v1
Date: Tue, 28 Feb 2023 10:05:45 GMT
Title: Ultra-low Precision Multiplication-free Training for Deep Neural Networks
Authors: Chang Liu, Rui Zhang, Xishan Zhang, Yifan Hao, Zidong Du, Xing Hu, Ling Li, Qi Guo
Abstract summary: In training, the linear layers consume the most energy because of the intense use of energy-consuming full-precision multiplication. We propose an Adaptive Layer-wise Scaling PoT Quantization (ALS-POTQ) method and a multiplication-Free MAC (MF-MAC) to replace all of the FP32 multiplications. In our training scheme, all of the above methods do not introduce extra multiplications, so we reduce up to 95.8% of the energy consumption in linear layers during training.
Score: 20.647925576138807
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The training for deep neural networks (DNNs) demands immense energy consumption, which restricts the development of deep learning as well as increases carbon emissions. Thus, the study of energy-efficient training for DNNs is essential. In training, the linear layers consume the most energy because of the intense use of energy-consuming full-precision (FP32) multiplication in multiply-accumulate (MAC). The energy-efficient works try to decrease the precision of multiplication or replace the multiplication with energy-efficient operations such as addition or bitwise shift, to reduce the energy consumption of FP32 multiplications. However, the existing energy-efficient works cannot replace all of the FP32 multiplications during both forward and backward propagation with low-precision energy-efficient operations. In this work, we propose an Adaptive Layer-wise Scaling PoT Quantization (ALS-POTQ) method and a Multiplication-Free MAC (MF-MAC) to replace all of the FP32 multiplications with the INT4 additions and 1-bit XOR operations. In addition, we propose Weight Bias Correction and Parameterized Ratio Clipping techniques for stable training and improving accuracy. In our training scheme, all of the above methods do not introduce extra multiplications, so we reduce up to 95.8% of the energy consumption in linear layers during training. Experimentally, we achieve an accuracy degradation of less than 1% for CNN models on ImageNet and Transformer model on the WMT En-De task. In summary, we significantly outperform the existing methods for both energy efficiency and accuracy.

Related papers

Hadamard Domain Training with Integers for Class Incremental Quantized Learning [1.4416751609100908]
Continual learning can be cost-prohibitive for resource-constraint edge platforms. We propose a technique that transforms to enable low-precision training with only integer matrix multiplications. We achieve less than 0.5% and 3% accuracy degradation while we quantize all matrix multiplications inputs down to 4-bits with 8-bit accumulators.
arXiv Detail & Related papers (2023-10-05T16:52:59Z)
Minimizing Energy Consumption of Deep Learning Models by Energy-Aware Training [26.438415753870917]
We propose EAT, a gradient-based algorithm that aims to reduce energy consumption during model training. We demonstrate that our energy-aware training algorithm EAT is able to train networks with a better trade-off between classification performance and energy efficiency.
arXiv Detail & Related papers (2023-07-01T15:44:01Z)
DIVISION: Memory Efficient Training via Dual Activation Precision [60.153754740511864]
State-of-the-art work combines a search of quantization bit-width with the training, which makes the procedure complicated and less transparent. We propose a simple and effective method to compress DNN training. Experiment results show DIVISION has better comprehensive performance than state-of-the-art methods, including over 10x compression of activation maps and competitive training throughput, without loss of model accuracy.
arXiv Detail & Related papers (2022-08-05T03:15:28Z)
Energy awareness in low precision neural networks [41.69995577490698]
Power consumption is a major obstacle in the deployment of deep neural networks (DNNs) on end devices. We present PANN, a simple approach for approxing any full-precision network by a low-power fixed-precision variant. In contrast to previous methods, PANN incurs only a minor degradation in accuracy w.r.t. the full-precision version of the network, even when working at the power-budget of a 2-bit quantized variant.
arXiv Detail & Related papers (2022-02-06T14:44:55Z)
On the Tradeoff between Energy, Precision, and Accuracy in Federated Quantized Neural Networks [68.52621234990728]
Federated learning (FL) over wireless networks requires balancing between accuracy, energy efficiency, and precision. We propose a quantized FL framework that represents data with a finite level of precision in both local training and uplink transmission. Our framework can reduce energy consumption by up to 53% compared to a standard FL model.
arXiv Detail & Related papers (2021-11-15T17:00:03Z)
Positive/Negative Approximate Multipliers for DNN Accelerators [3.1921317895626493]
We present a filter-oriented approximation method to map the weights to the appropriate modes of the approximate multiplier. Our approach achieves 18.33% energy gains on average across 7 NNs on 4 different datasets for a maximum accuracy drop of only 1%.
arXiv Detail & Related papers (2021-07-20T09:36:24Z)
Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update [49.948082497688404]
Training large-scale deep neural networks (DNNs) currently requires a significant amount of energy, leading to serious environmental impacts. One promising approach to reduce the energy costs is representing DNNs with low-precision numbers. We jointly design a lowprecision training framework involving a logarithmic number system (LNS) and a multiplicative weight update training method, termed LNS-Madam.
arXiv Detail & Related papers (2021-06-26T00:32:17Z)
SmartDeal: Re-Modeling Deep Network Weights for Efficient Inference and Training [82.35376405568975]
Deep neural networks (DNNs) come with heavy parameterization, leading to external dynamic random-access memory (DRAM) for storage. We present SmartDeal (SD), an algorithm framework to trade higher-cost memory storage/access for lower-cost computation. We show that SD leads to 10.56x and 4.48x reduction in the storage and training energy, with negligible accuracy loss compared to state-of-the-art training baselines.
arXiv Detail & Related papers (2021-01-04T18:54:07Z)
ShiftAddNet: A Hardware-Inspired Deep Network [87.18216601210763]
ShiftAddNet is an energy-efficient multiplication-less deep neural network. It leads to both energy-efficient inference and training, without compromising expressive capacity. ShiftAddNet aggressively reduces over 80% hardware-quantified energy cost of DNNs training and inference, while offering comparable or better accuracies.
arXiv Detail & Related papers (2020-10-24T05:09:14Z)
Bit Error Robustness for Energy-Efficient DNN Accelerators [93.58572811484022]
We show that a combination of robust fixed-point quantization, weight clipping, and random bit error training (RandBET) improves robustness against random bit errors. This leads to high energy savings from both low-voltage operation as well as low-precision quantization.
arXiv Detail & Related papers (2020-06-24T18:23:10Z)
ESSOP: Efficient and Scalable Stochastic Outer Product Architecture for Deep Learning [1.2019888796331233]
Matrix-vector multiplications (MVM) and vector-vector outer product (VVOP) are the two most expensive operations associated with the training of deep neural networks (DNNs) We introduce efficient techniques to SC for weight update in DNNs with the activation functions required by many state-of-the-art networks. Our architecture reduces the computational cost by re-using random numbers and replacing certain FP multiplication operations by bit shift scaling. Hardware design of ESSOP at 14nm technology node shows that, compared to a highly pipelined FP16 multiplier, ESSOP is 82.2% and 93.7% better in energy
arXiv Detail & Related papers (2020-03-25T07:54:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.