Selfish Sparse RNN Training
- URL: http://arxiv.org/abs/2101.09048v2
- Date: Thu, 28 Jan 2021 16:38:09 GMT
- Title: Selfish Sparse RNN Training
- Authors: Shiwei Liu, Decebal Constantin Mocanu, Yulong Pei, Mykola Pechenizkiy
- Abstract summary: We propose an approach to train sparse RNNs with a fixed parameter count in one single run, without compromising performance.
We achieve state-of-the-art sparse training results with various datasets on Penn TreeBank and Wikitext-2.
- Score: 13.165729746380816
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sparse neural networks have been widely applied to reduce the necessary
resource requirements to train and deploy over-parameterized deep neural
networks. For inference acceleration, methods that induce sparsity from a
pre-trained dense network (dense-to-sparse) work effectively. Recently, dynamic
sparse training (DST) has been proposed to train sparse neural networks without
pre-training a dense network (sparse-to-sparse), so that the training process
can also be accelerated. However, previous sparse-to-sparse methods mainly
focus on Multilayer Perceptron Networks (MLPs) and Convolutional Neural
Networks (CNNs), failing to match the performance of dense-to-sparse methods in
Recurrent Neural Networks (RNNs) setting. In this paper, we propose an approach
to train sparse RNNs with a fixed parameter count in one single run, without
compromising performance. During training, we allow RNN layers to have a
non-uniform redistribution across cell gates for a better regularization.
Further, we introduce SNT-ASGD, a variant of the averaged stochastic gradient
optimizer, which significantly improves the performance of all sparse training
methods for RNNs. Using these strategies, we achieve state-of-the-art sparse
training results with various types of RNNs on Penn TreeBank and Wikitext-2
datasets.
Related papers
- Multi-Objective Linear Ensembles for Robust and Sparse Training of Few-Bit Neural Networks [5.246498560938275]
We study the case of few-bit discrete-valued neural networks, both Binarized Neural Networks (BNNs) and Neural Networks (INNs)
Our contribution is a multi-objective ensemble approach based on training a single NN for each possible pair of classes and applying a majority voting scheme to predict the final output.
We compare this BeMi approach to the current state-of-the-art in solver-based NN training and gradient-based training, focusing on BNN learning in few-shot contexts.
arXiv Detail & Related papers (2022-12-07T14:23:43Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models.
Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency.
We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z) - Learning with Local Gradients at the Edge [14.94491070863641]
We present a novel backpropagation-free optimization algorithm dubbed Target Projection Gradient Descent (tpSGD)
tpSGD generalizes direct random target projection to work with arbitrary loss functions.
We evaluate the performance of tpSGD in training deep neural networks and extend the approach to multi-layer RNNs.
arXiv Detail & Related papers (2022-08-17T19:51:06Z) - Training High-Performance Low-Latency Spiking Neural Networks by
Differentiation on Spike Representation [70.75043144299168]
Spiking Neural Network (SNN) is a promising energy-efficient AI model when implemented on neuromorphic hardware.
It is a challenge to efficiently train SNNs due to their non-differentiability.
We propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance.
arXiv Detail & Related papers (2022-05-01T12:44:49Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - Local Critic Training for Model-Parallel Learning of Deep Neural
Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training.
We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs)
We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z) - Enabling Deep Spiking Neural Networks with Hybrid Conversion and Spike
Timing Dependent Backpropagation [10.972663738092063]
Spiking Neural Networks (SNNs) operate with asynchronous discrete events (or spikes)
We present a computationally-efficient training technique for deep SNNs.
We achieve top-1 accuracy of 65.19% for ImageNet dataset on SNN with 250 time steps, which is 10X faster compared to converted SNNs with similar accuracy.
arXiv Detail & Related papers (2020-05-04T19:30:43Z) - A Hybrid Method for Training Convolutional Neural Networks [3.172761915061083]
We propose a hybrid method that uses both backpropagation and evolutionary strategies to train Convolutional Neural Networks.
We show that the proposed hybrid method is capable of improving upon regular training in the task of image classification.
arXiv Detail & Related papers (2020-04-15T17:52:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.