MomentumRNN: Integrating Momentum into Recurrent Neural Networks
- URL: http://arxiv.org/abs/2006.06919v2
- Date: Sun, 11 Oct 2020 17:19:06 GMT
- Title: MomentumRNN: Integrating Momentum into Recurrent Neural Networks
- Authors: Tan M. Nguyen, Richard G. Baraniuk, Andrea L. Bertozzi, Stanley J.
Osher, Bao Wang
- Abstract summary: We show that MomentumRNNs alleviate the vanishing gradient issue in training RNNs.
MomentumRNN is applicable to many types of recurrent cells, including those in the state-of-the-art RNNs.
We show that other advanced momentum-based optimization methods, such as Adam and Nesterov accelerated gradients with a restart, can be easily incorporated into the MomentumRNN framework.
- Score: 32.40217829362088
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Designing deep neural networks is an art that often involves an expensive
search over candidate architectures. To overcome this for recurrent neural nets
(RNNs), we establish a connection between the hidden state dynamics in an RNN
and gradient descent (GD). We then integrate momentum into this framework and
propose a new family of RNNs, called {\em MomentumRNNs}. We theoretically prove
and numerically demonstrate that MomentumRNNs alleviate the vanishing gradient
issue in training RNNs. We study the momentum long-short term memory
(MomentumLSTM) and verify its advantages in convergence speed and accuracy over
its LSTM counterpart across a variety of benchmarks. We also demonstrate that
MomentumRNN is applicable to many types of recurrent cells, including those in
the state-of-the-art orthogonal RNNs. Finally, we show that other advanced
momentum-based optimization methods, such as Adam and Nesterov accelerated
gradients with a restart, can be easily incorporated into the MomentumRNN
framework for designing new recurrent cells with even better performance. The
code is available at https://github.com/minhtannguyen/MomentumRNN.
Related papers
- Scalable Mechanistic Neural Networks [52.28945097811129]
We propose an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences.
By reformulating the original Mechanistic Neural Network (MNN) we reduce the computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively, to linear.
Extensive experiments demonstrate that S-MNN matches the original MNN in precision while substantially reducing computational resources.
arXiv Detail & Related papers (2024-10-08T14:27:28Z) - Accurate Mapping of RNNs on Neuromorphic Hardware with Adaptive Spiking Neurons [2.9410174624086025]
We present a $SigmaDelta$-low-pass RNN (lpRNN) for mapping rate-based RNNs to spiking neural networks (SNNs)
An adaptive spiking neuron model encodes signals using $SigmaDelta$-modulation and enables precise mapping.
We demonstrate the implementation of the lpRNN on Intel's neuromorphic research chip Loihi.
arXiv Detail & Related papers (2024-07-18T14:06:07Z) - Adaptive-saturated RNN: Remember more with less instability [2.191505742658975]
This work proposes Adaptive-Saturated RNNs (asRNN), a variant that dynamically adjusts its saturation level between the two approaches.
Our experiments show encouraging results of asRNN on challenging sequence learning benchmarks compared to several strong competitors.
arXiv Detail & Related papers (2023-04-24T02:28:03Z) - Training High-Performance Low-Latency Spiking Neural Networks by
Differentiation on Spike Representation [70.75043144299168]
Spiking Neural Network (SNN) is a promising energy-efficient AI model when implemented on neuromorphic hardware.
It is a challenge to efficiently train SNNs due to their non-differentiability.
We propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance.
arXiv Detail & Related papers (2022-05-01T12:44:49Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - A Time Encoding approach to training Spiking Neural Networks [3.655021726150368]
Spiking Neural Networks (SNNs) have been gaining in popularity.
In this paper, we provide an extra tool to help us understand and train SNNs by using theory from the field of time encoding.
arXiv Detail & Related papers (2021-10-13T14:07:11Z) - Training Feedback Spiking Neural Networks by Implicit Differentiation on
the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware.
Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks.
We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z) - Pruning of Deep Spiking Neural Networks through Gradient Rewiring [41.64961999525415]
Spiking Neural Networks (SNNs) have been attached great importance due to their biological plausibility and high energy-efficiency on neuromorphic chips.
Most existing methods directly apply pruning approaches in artificial neural networks (ANNs) to SNNs, which ignore the difference between ANNs and SNNs.
We propose gradient rewiring (Grad R), a joint learning algorithm of connectivity and weight for SNNs, that enables us to seamlessly optimize network structure without retrain.
arXiv Detail & Related papers (2021-05-11T10:05:53Z) - Optimal Conversion of Conventional Artificial Neural Networks to Spiking
Neural Networks [0.0]
Spiking neural networks (SNNs) are biology-inspired artificial neural networks (ANNs)
We propose a novel strategic pipeline that transfers the weights to the target SNN by combining threshold balance and soft-reset mechanisms.
Our method is promising to get implanted onto embedded platforms with better support of SNNs with limited energy and memory.
arXiv Detail & Related papers (2021-02-28T12:04:22Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.