Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for
Efficient Training
- URL: http://arxiv.org/abs/2106.12089v1
- Date: Tue, 22 Jun 2021 22:44:32 GMT
- Title: Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for
Efficient Training
- Authors: Anup Sarma, Sonali Singh, Huaipan Jiang, Rui Zhang, Mahmut T Kandemir
and Chita R Das
- Abstract summary: We propose to structure dropout patterns, by dropping out the same set of physical neurons within a batch, resulting in column (row) level hidden state sparsity.
We conduct experiments for three representative NLP tasks: language modelling on the PTB dataset, OpenNMT based machine translation using the IWSLT De-En and En-Vi datasets, and named entity recognition sequence labelling.
- Score: 18.521882534906972
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recurrent Neural Networks (RNNs), more specifically their Long Short-Term
Memory (LSTM) variants, have been widely used as a deep learning tool for
tackling sequence-based learning tasks in text and speech. Training of such
LSTM applications is computationally intensive due to the recurrent nature of
hidden state computation that repeats for each time step. While sparsity in
Deep Neural Nets has been widely seen as an opportunity for reducing
computation time in both training and inference phases, the usage of non-ReLU
activation in LSTM RNNs renders the opportunities for such dynamic sparsity
associated with neuron activation and gradient values to be limited or
non-existent. In this work, we identify dropout induced sparsity for LSTMs as a
suitable mode of computation reduction. Dropout is a widely used regularization
mechanism, which randomly drops computed neuron values during each iteration of
training. We propose to structure dropout patterns, by dropping out the same
set of physical neurons within a batch, resulting in column (row) level hidden
state sparsity, which are well amenable to computation reduction at run-time in
general-purpose SIMD hardware as well as systolic arrays. We conduct our
experiments for three representative NLP tasks: language modelling on the PTB
dataset, OpenNMT based machine translation using the IWSLT De-En and En-Vi
datasets, and named entity recognition sequence labelling using the CoNLL-2003
shared task. We demonstrate that our proposed approach can be used to translate
dropout-based computation reduction into reduced training time, with
improvement ranging from 1.23x to 1.64x, without sacrificing the target metric.
Related papers
- LLS: Local Learning Rule for Deep Neural Networks Inspired by Neural Activity Synchronization [6.738409533239947]
Training deep neural networks (DNNs) using traditional backpropagation (BP) presents challenges in terms of computational complexity and energy consumption.
We propose a novel Local Learning rule inspired by neural activity Synchronization phenomena (LLS) observed in the brain.
LLS achieves comparable performance with up to $300 times$ fewer multiply-accumulate (MAC) operations and half the memory requirements of BP.
arXiv Detail & Related papers (2024-05-24T18:24:24Z) - Accelerating SNN Training with Stochastic Parallelizable Spiking Neurons [1.7056768055368383]
Spiking neural networks (SNN) are able to learn features while using less energy, especially on neuromorphic hardware.
Most widely used neuron in deep learning is the temporal and Fire (LIF) neuron.
arXiv Detail & Related papers (2023-06-22T04:25:27Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Towards Memory- and Time-Efficient Backpropagation for Training Spiking
Neural Networks [70.75043144299168]
Spiking Neural Networks (SNNs) are promising energy-efficient models for neuromorphic computing.
We propose the Spatial Learning Through Time (SLTT) method that can achieve high performance while greatly improving training efficiency.
Our method achieves state-of-the-art accuracy on ImageNet, while the memory cost and training time are reduced by more than 70% and 50%, respectively, compared with BPTT.
arXiv Detail & Related papers (2023-02-28T05:01:01Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex.
In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z) - Low-bit Quantization of Recurrent Neural Network Language Models Using
Alternating Direction Methods of Multipliers [67.688697838109]
This paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM)
Experiments on two tasks suggest the proposed ADMM quantization achieved a model size compression factor of up to 31 times over the full precision baseline RNNLMs.
arXiv Detail & Related papers (2021-11-29T09:30:06Z) - Oscillatory Fourier Neural Network: A Compact and Efficient Architecture
for Sequential Processing [16.69710555668727]
We propose a novel neuron model that has cosine activation with a time varying component for sequential processing.
The proposed neuron provides an efficient building block for projecting sequential inputs into spectral domain.
Applying the proposed model to sentiment analysis on IMDB dataset reaches 89.4% test accuracy within 5 epochs.
arXiv Detail & Related papers (2021-09-14T19:08:07Z) - Spiking Neural Networks with Improved Inherent Recurrence Dynamics for
Sequential Learning [6.417011237981518]
Spiking neural networks (SNNs) with leaky integrate and fire (LIF) neurons can be operated in an event-driven manner.
We show that SNNs can be trained for sequential tasks and propose modifications to a network of LIF neurons.
We then develop a training scheme to train the proposed SNNs with improved inherent recurrence dynamics.
arXiv Detail & Related papers (2021-09-04T17:13:28Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.