Kernel Based Progressive Distillation for Adder Neural Networks
- URL: http://arxiv.org/abs/2009.13044v3
- Date: Thu, 15 Oct 2020 03:39:58 GMT
- Title: Kernel Based Progressive Distillation for Adder Neural Networks
- Authors: Yixing Xu, Chang Xu, Xinghao Chen, Wei Zhang, Chunjing Xu, Yunhe Wang
- Abstract summary: Adder Neural Networks (ANNs) which only contain additions bring us a new way of developing deep neural networks with low energy consumption.
There is an accuracy drop when replacing all convolution filters by adder filters.
We present a novel method for further improving the performance of ANNs without increasing the trainable parameters.
- Score: 71.731127378807
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adder Neural Networks (ANNs) which only contain additions bring us a new way
of developing deep neural networks with low energy consumption. Unfortunately,
there is an accuracy drop when replacing all convolution filters by adder
filters. The main reason here is the optimization difficulty of ANNs using
$\ell_1$-norm, in which the estimation of gradient in back propagation is
inaccurate. In this paper, we present a novel method for further improving the
performance of ANNs without increasing the trainable parameters via a
progressive kernel based knowledge distillation (PKKD) method. A convolutional
neural network (CNN) with the same architecture is simultaneously initialized
and trained as a teacher network, features and weights of ANN and CNN will be
transformed to a new space to eliminate the accuracy drop. The similarity is
conducted in a higher-dimensional space to disentangle the difference of their
distributions using a kernel based method. Finally, the desired ANN is learned
based on the information from both the ground-truth and teacher, progressively.
The effectiveness of the proposed method for learning ANN with higher
performance is then well-verified on several benchmarks. For instance, the
ANN-50 trained using the proposed PKKD method obtains a 76.8\% top-1 accuracy
on ImageNet dataset, which is 0.6\% higher than that of the ResNet-50.
Related papers
- Speed Limits for Deep Learning [67.69149326107103]
Recent advancement in thermodynamics allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network.
We provide analytical expressions for these speed limits for linear and linearizable neural networks.
Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels -- learning is optimal in a scaling sense.
arXiv Detail & Related papers (2023-07-27T06:59:46Z) - Joint A-SNN: Joint Training of Artificial and Spiking Neural Networks
via Self-Distillation and Weight Factorization [12.1610509770913]
Spiking Neural Networks (SNNs) mimic the spiking nature of brain neurons.
We propose a joint training framework of ANN and SNN, in which the ANN can guide the SNN's optimization.
Our method consistently outperforms many other state-of-the-art training methods.
arXiv Detail & Related papers (2023-05-03T13:12:17Z) - Desire Backpropagation: A Lightweight Training Algorithm for Multi-Layer
Spiking Neural Networks based on Spike-Timing-Dependent Plasticity [13.384228628766236]
Spiking neural networks (SNNs) are a viable alternative to conventional artificial neural networks.
We present desire backpropagation, a method to derive the desired spike activity of all neurons, including the hidden ones.
We trained three-layer networks to classify MNIST and Fashion-MNIST images and reached an accuracy of 98.41% and 87.56%, respectively.
arXiv Detail & Related papers (2022-11-10T08:32:13Z) - SNN2ANN: A Fast and Memory-Efficient Training Framework for Spiking
Neural Networks [117.56823277328803]
Spiking neural networks are efficient computation models for low-power environments.
We propose a SNN-to-ANN (SNN2ANN) framework to train the SNN in a fast and memory-efficient way.
Experiment results show that our SNN2ANN-based models perform well on the benchmark datasets.
arXiv Detail & Related papers (2022-06-19T16:52:56Z) - Training High-Performance Low-Latency Spiking Neural Networks by
Differentiation on Spike Representation [70.75043144299168]
Spiking Neural Network (SNN) is a promising energy-efficient AI model when implemented on neuromorphic hardware.
It is a challenge to efficiently train SNNs due to their non-differentiability.
We propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance.
arXiv Detail & Related papers (2022-05-01T12:44:49Z) - Spiking neural networks trained via proxy [0.696125353550498]
We propose a new learning algorithm to train spiking neural networks (SNN) using conventional artificial neural networks (ANN) as proxy.
We couple two SNN and ANN networks, respectively, made of integrate-and-fire (IF) and ReLU neurons with the same network architectures and shared synaptic weights.
By assuming IF neuron with rate-coding as an approximation of ReLU, we backpropagate the error of the SNN in the proxy ANN to update the shared weights, simply by replacing the ANN final output with that of the SNN.
arXiv Detail & Related papers (2021-09-27T17:29:51Z) - A Free Lunch From ANN: Towards Efficient, Accurate Spiking Neural
Networks Calibration [11.014383784032084]
Spiking Neural Network (SNN) has been recognized as one of the next generation of neural networks.
We show that a proper way to calibrate the parameters during the conversion of ANN to SNN can bring significant improvements.
arXiv Detail & Related papers (2021-06-13T13:20:12Z) - RIFLE: Backpropagation in Depth for Deep Transfer Learning through
Re-Initializing the Fully-connected LayEr [60.07531696857743]
Fine-tuning the deep convolution neural network(CNN) using a pre-trained model helps transfer knowledge learned from larger datasets to the target task.
We propose RIFLE - a strategy that deepens backpropagation in transfer learning settings.
RIFLE brings meaningful updates to the weights of deep CNN layers and improves low-level feature learning.
arXiv Detail & Related papers (2020-07-07T11:27:43Z) - Training Deep Spiking Neural Networks [0.0]
Brain-inspired spiking neural networks (SNNs) with neuromorphic hardware may offer orders of magnitude higher energy efficiency.
We show that is is possible to train SNN with ResNet50 architecture on CIFAR100 and Imagenette object recognition datasets.
The trained SNN falls behind in accuracy compared to analogous ANN but requires several orders of magnitude less inference time steps.
arXiv Detail & Related papers (2020-06-08T09:47:05Z) - AdderNet: Do We Really Need Multiplications in Deep Learning? [159.174891462064]
We present adder networks (AdderNets) to trade massive multiplications in deep neural networks for much cheaper additions to reduce computation costs.
We develop a special back-propagation approach for AdderNets by investigating the full-precision gradient.
As a result, the proposed AdderNets can achieve 74.9% Top-1 accuracy 91.7% Top-5 accuracy using ResNet-50 on the ImageNet dataset.
arXiv Detail & Related papers (2019-12-31T06:56:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.