Adder Neural Networks
- URL: http://arxiv.org/abs/2105.14202v2
- Date: Tue, 1 Jun 2021 06:16:59 GMT
- Title: Adder Neural Networks
- Authors: Hanting Chen, Yunhe Wang, Chang Xu, Chao Xu, Chunjing Xu, Tong Zhang
- Abstract summary: We present adder networks (AdderNets) to trade massive multiplications in deep neural networks.
In AdderNets, we take the $ell_p$-norm distance between filters and input feature as the output response.
We show that the proposed AdderNets can achieve 75.7% Top-1 accuracy 92.3% Top-5 accuracy using ResNet-50 on the ImageNet dataset.
- Score: 75.54239599016535
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compared with cheap addition operation, multiplication operation is of much
higher computation complexity. The widely-used convolutions in deep neural
networks are exactly cross-correlation to measure the similarity between input
feature and convolution filters, which involves massive multiplications between
float values. In this paper, we present adder networks (AdderNets) to trade
these massive multiplications in deep neural networks, especially convolutional
neural networks (CNNs), for much cheaper additions to reduce computation costs.
In AdderNets, we take the $\ell_1$-norm distance between filters and input
feature as the output response. The influence of this new similarity measure on
the optimization of neural network have been thoroughly analyzed. To achieve a
better performance, we develop a special training approach for AdderNets by
investigating the $\ell_p$-norm. We then propose an adaptive learning rate
strategy to enhance the training procedure of AdderNets according to the
magnitude of each neuron's gradient. As a result, the proposed AdderNets can
achieve 75.7% Top-1 accuracy 92.3% Top-5 accuracy using ResNet-50 on the
ImageNet dataset without any multiplication in convolutional layer. Moreover,
we develop a theoretical foundation for AdderNets, by showing that both the
single hidden layer AdderNet and the width-bounded deep AdderNet with ReLU
activation functions are universal function approximators. These results match
those of the traditional neural networks using the more complex multiplication
units. An approximation bound for AdderNets with a single hidden layer is also
presented.
Related papers
- Redistribution of Weights and Activations for AdderNet Quantization [33.78204350112026]
Adder Neural Network (AdderNet) provides a new way for developing energy-efficient neural networks.
To achieve higher hardware efficiency, it is necessary to further study the low-bit quantization of AdderNet.
We propose a new quantization algorithm by redistributing the weights and the activations.
arXiv Detail & Related papers (2022-12-20T12:24:48Z) - An Empirical Study of Adder Neural Networks for Object Detection [67.64041181937624]
Adder neural networks (AdderNets) have shown impressive performance on image classification with only addition operations.
We present an empirical study of AdderNets for object detection.
arXiv Detail & Related papers (2021-12-27T11:03:13Z) - AdderNet and its Minimalist Hardware Design for Energy-Efficient
Artificial Intelligence [111.09105910265154]
We present a novel minimalist hardware architecture using adder convolutional neural network (AdderNet)
The whole AdderNet can practically achieve 16% enhancement in speed.
We conclude the AdderNet is able to surpass all the other competitors.
arXiv Detail & Related papers (2021-01-25T11:31:52Z) - ShiftAddNet: A Hardware-Inspired Deep Network [87.18216601210763]
ShiftAddNet is an energy-efficient multiplication-less deep neural network.
It leads to both energy-efficient inference and training, without compromising expressive capacity.
ShiftAddNet aggressively reduces over 80% hardware-quantified energy cost of DNNs training and inference, while offering comparable or better accuracies.
arXiv Detail & Related papers (2020-10-24T05:09:14Z) - Add a SideNet to your MainNet [0.0]
We develop a method for adaptive network complexity by attaching a small classification layer, which we call SideNet, to a large pretrained network, which we call MainNet.
Given an input, the SideNet returns a classification if its confidence level, obtained via softmax, surpasses a user determined threshold, and only passes it along to the large MainNet for further processing if its confidence is too low.
Experimental results show that simple single hidden layer perceptron SideNets added onto pretrained ResNet and BERT MainNets allow for substantial decreases in compute with minimal drops in performance on image and text classification tasks.
arXiv Detail & Related papers (2020-07-14T19:25:32Z) - ReActNet: Towards Precise Binary Neural Network with Generalized
Activation Functions [76.05981545084738]
We propose several ideas for enhancing a binary network to close its accuracy gap from real-valued networks without incurring any additional computational cost.
We first construct a baseline network by modifying and binarizing a compact real-valued network with parameter-free shortcuts.
We show that the proposed ReActNet outperforms all the state-of-the-arts by a large margin.
arXiv Detail & Related papers (2020-03-07T02:12:02Z) - AdderNet: Do We Really Need Multiplications in Deep Learning? [159.174891462064]
We present adder networks (AdderNets) to trade massive multiplications in deep neural networks for much cheaper additions to reduce computation costs.
We develop a special back-propagation approach for AdderNets by investigating the full-precision gradient.
As a result, the proposed AdderNets can achieve 74.9% Top-1 accuracy 91.7% Top-5 accuracy using ResNet-50 on the ImageNet dataset.
arXiv Detail & Related papers (2019-12-31T06:56:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.