AdderNet: Do We Really Need Multiplications in Deep Learning?
- URL: http://arxiv.org/abs/1912.13200v6
- Date: Thu, 1 Jul 2021 04:46:58 GMT
- Title: AdderNet: Do We Really Need Multiplications in Deep Learning?
- Authors: Hanting Chen, Yunhe Wang, Chunjing Xu, Boxin Shi, Chao Xu, Qi Tian,
Chang Xu
- Abstract summary: We present adder networks (AdderNets) to trade massive multiplications in deep neural networks for much cheaper additions to reduce computation costs.
We develop a special back-propagation approach for AdderNets by investigating the full-precision gradient.
As a result, the proposed AdderNets can achieve 74.9% Top-1 accuracy 91.7% Top-5 accuracy using ResNet-50 on the ImageNet dataset.
- Score: 159.174891462064
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compared with cheap addition operation, multiplication operation is of much
higher computation complexity. The widely-used convolutions in deep neural
networks are exactly cross-correlation to measure the similarity between input
feature and convolution filters, which involves massive multiplications between
float values. In this paper, we present adder networks (AdderNets) to trade
these massive multiplications in deep neural networks, especially convolutional
neural networks (CNNs), for much cheaper additions to reduce computation costs.
In AdderNets, we take the $\ell_1$-norm distance between filters and input
feature as the output response. The influence of this new similarity measure on
the optimization of neural network have been thoroughly analyzed. To achieve a
better performance, we develop a special back-propagation approach for
AdderNets by investigating the full-precision gradient. We then propose an
adaptive learning rate strategy to enhance the training procedure of AdderNets
according to the magnitude of each neuron's gradient. As a result, the proposed
AdderNets can achieve 74.9% Top-1 accuracy 91.7% Top-5 accuracy using ResNet-50
on the ImageNet dataset without any multiplication in convolution layer. The
codes are publicly available at: https://github.com/huaweinoah/AdderNet.
Related papers
- RelChaNet: Neural Network Feature Selection using Relative Change Scores [0.0]
We introduce RelChaNet, a novel and lightweight feature selection algorithm that uses neuron pruning and regrowth in the input layer of a dense neural network.
Our approach generally outperforms the current state-of-the-art methods, and in particular improves the average accuracy by 2% on the MNIST dataset.
arXiv Detail & Related papers (2024-10-03T09:56:39Z) - Neural Network Optimization for Reinforcement Learning Tasks Using
Sparse Computations [3.4328283704703866]
This article proposes a sparse computation-based method for optimizing neural networks for reinforcement learning tasks.
It significantly reduces the number of multiplications when running neural networks.
arXiv Detail & Related papers (2022-01-07T18:09:23Z) - Adder Neural Networks [75.54239599016535]
We present adder networks (AdderNets) to trade massive multiplications in deep neural networks.
In AdderNets, we take the $ell_p$-norm distance between filters and input feature as the output response.
We show that the proposed AdderNets can achieve 75.7% Top-1 accuracy 92.3% Top-5 accuracy using ResNet-50 on the ImageNet dataset.
arXiv Detail & Related papers (2021-05-29T04:02:51Z) - AdderNet and its Minimalist Hardware Design for Energy-Efficient
Artificial Intelligence [111.09105910265154]
We present a novel minimalist hardware architecture using adder convolutional neural network (AdderNet)
The whole AdderNet can practically achieve 16% enhancement in speed.
We conclude the AdderNet is able to surpass all the other competitors.
arXiv Detail & Related papers (2021-01-25T11:31:52Z) - ShiftAddNet: A Hardware-Inspired Deep Network [87.18216601210763]
ShiftAddNet is an energy-efficient multiplication-less deep neural network.
It leads to both energy-efficient inference and training, without compromising expressive capacity.
ShiftAddNet aggressively reduces over 80% hardware-quantified energy cost of DNNs training and inference, while offering comparable or better accuracies.
arXiv Detail & Related papers (2020-10-24T05:09:14Z) - Kernel Based Progressive Distillation for Adder Neural Networks [71.731127378807]
Adder Neural Networks (ANNs) which only contain additions bring us a new way of developing deep neural networks with low energy consumption.
There is an accuracy drop when replacing all convolution filters by adder filters.
We present a novel method for further improving the performance of ANNs without increasing the trainable parameters.
arXiv Detail & Related papers (2020-09-28T03:29:19Z) - Add a SideNet to your MainNet [0.0]
We develop a method for adaptive network complexity by attaching a small classification layer, which we call SideNet, to a large pretrained network, which we call MainNet.
Given an input, the SideNet returns a classification if its confidence level, obtained via softmax, surpasses a user determined threshold, and only passes it along to the large MainNet for further processing if its confidence is too low.
Experimental results show that simple single hidden layer perceptron SideNets added onto pretrained ResNet and BERT MainNets allow for substantial decreases in compute with minimal drops in performance on image and text classification tasks.
arXiv Detail & Related papers (2020-07-14T19:25:32Z) - Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization.
Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.