Related papers: Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training

Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training

URL: http://arxiv.org/abs/2106.12089v1
Date: Tue, 22 Jun 2021 22:44:32 GMT
Title: Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training
Authors: Anup Sarma, Sonali Singh, Huaipan Jiang, Rui Zhang, Mahmut T Kandemir and Chita R Das
Abstract summary: We propose to structure dropout patterns, by dropping out the same set of physical neurons within a batch, resulting in column (row) level hidden state sparsity. We conduct experiments for three representative NLP tasks: language modelling on the PTB dataset, OpenNMT based machine translation using the IWSLT De-En and En-Vi datasets, and named entity recognition sequence labelling.
Score: 18.521882534906972
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recurrent Neural Networks (RNNs), more specifically their Long Short-Term Memory (LSTM) variants, have been widely used as a deep learning tool for tackling sequence-based learning tasks in text and speech. Training of such LSTM applications is computationally intensive due to the recurrent nature of hidden state computation that repeats for each time step. While sparsity in Deep Neural Nets has been widely seen as an opportunity for reducing computation time in both training and inference phases, the usage of non-ReLU activation in LSTM RNNs renders the opportunities for such dynamic sparsity associated with neuron activation and gradient values to be limited or non-existent. In this work, we identify dropout induced sparsity for LSTMs as a suitable mode of computation reduction. Dropout is a widely used regularization mechanism, which randomly drops computed neuron values during each iteration of training. We propose to structure dropout patterns, by dropping out the same set of physical neurons within a batch, resulting in column (row) level hidden state sparsity, which are well amenable to computation reduction at run-time in general-purpose SIMD hardware as well as systolic arrays. We conduct our experiments for three representative NLP tasks: language modelling on the PTB dataset, OpenNMT based machine translation using the IWSLT De-En and En-Vi datasets, and named entity recognition sequence labelling using the CoNLL-2003 shared task. We demonstrate that our proposed approach can be used to translate dropout-based computation reduction into reduced training time, with improvement ranging from 1.23x to 1.64x, without sacrificing the target metric.

Related papers

TESS: A Scalable Temporally and Spatially Local Learning Rule for Spiking Neural Networks [6.805933498669221]
Training neural networks (SNNs) on resource-constrained devices remains challenging due to high computational and memory demands. We introduce TESS, a temporally and spatially local learning rule for training SNNs. Our approach addresses both temporal and spatial credit assignments by relying solely on locally available signals within each neuron.
arXiv Detail & Related papers (2025-02-03T21:23:15Z)
LLS: Local Learning Rule for Deep Neural Networks Inspired by Neural Activity Synchronization [6.738409533239947]
Training deep neural networks (DNNs) using traditional backpropagation (BP) presents challenges in terms of computational complexity and energy consumption. We propose a novel Local Learning rule inspired by neural activity Synchronization phenomena (LLS) observed in the brain. LLS achieves comparable performance with up to $300 times$ fewer multiply-accumulate (MAC) operations and half the memory requirements of BP.
arXiv Detail & Related papers (2024-05-24T18:24:24Z)
Accelerating SNN Training with Stochastic Parallelizable Spiking Neurons [1.7056768055368383]
Spiking neural networks (SNN) are able to learn features while using less energy, especially on neuromorphic hardware. Most widely used neuron in deep learning is the temporal and Fire (LIF) neuron.
arXiv Detail & Related papers (2023-06-22T04:25:27Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
Towards Memory- and Time-Efficient Backpropagation for Training Spiking Neural Networks [70.75043144299168]
Spiking Neural Networks (SNNs) are promising energy-efficient models for neuromorphic computing. We propose the Spatial Learning Through Time (SLTT) method that can achieve high performance while greatly improving training efficiency. Our method achieves state-of-the-art accuracy on ImageNet, while the memory cost and training time are reduced by more than 70% and 50%, respectively, compared with BPTT.
arXiv Detail & Related papers (2023-02-28T05:01:01Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex. In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z)
Low-bit Quantization of Recurrent Neural Network Language Models Using Alternating Direction Methods of Multipliers [67.688697838109]
This paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM) Experiments on two tasks suggest the proposed ADMM quantization achieved a model size compression factor of up to 31 times over the full precision baseline RNNLMs.
arXiv Detail & Related papers (2021-11-29T09:30:06Z)
Oscillatory Fourier Neural Network: A Compact and Efficient Architecture for Sequential Processing [16.69710555668727]
We propose a novel neuron model that has cosine activation with a time varying component for sequential processing. The proposed neuron provides an efficient building block for projecting sequential inputs into spectral domain. Applying the proposed model to sentiment analysis on IMDB dataset reaches 89.4% test accuracy within 5 epochs.
arXiv Detail & Related papers (2021-09-14T19:08:07Z)
Spiking Neural Networks with Improved Inherent Recurrence Dynamics for Sequential Learning [6.417011237981518]
Spiking neural networks (SNNs) with leaky integrate and fire (LIF) neurons can be operated in an event-driven manner. We show that SNNs can be trained for sequential tasks and propose modifications to a network of LIF neurons. We then develop a training scheme to train the proposed SNNs with improved inherent recurrence dynamics.
arXiv Detail & Related papers (2021-09-04T17:13:28Z)
Progressive Tandem Learning for Pattern Recognition with Deep Spiking Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency. We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.