P-ADMMiRNN: Training RNN with Stable Convergence via An Efficient and
Paralleled ADMM Approach
- URL: http://arxiv.org/abs/2006.05622v3
- Date: Mon, 28 Mar 2022 11:05:37 GMT
- Title: P-ADMMiRNN: Training RNN with Stable Convergence via An Efficient and
Paralleled ADMM Approach
- Authors: Yu Tang, Zhigang Kan, Dequan Sun, Jingjing Xiao, Zhiquan Lai, Linbo
Qiao, Dongsheng Li
- Abstract summary: It is hard to train Recurrent Neural Network (RNN) with stable convergence and avoid gradient vanishing and exploding problems.
This work builds a new framework named ADMMiRNN upon the unfolded form of RNN to address the above challenges simultaneously.
- Score: 17.603762011446843
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is hard to train Recurrent Neural Network (RNN) with stable convergence
and avoid gradient vanishing and exploding problems, as the weights in the
recurrent unit are repeated from iteration to iteration. Moreover, RNN is
sensitive to the initialization of weights and bias, which brings difficulties
in training. The Alternating Direction Method of Multipliers (ADMM) has become
a promising algorithm to train neural networks beyond traditional stochastic
gradient algorithms with the gradient-free features and immunity to
unsatisfactory conditions. However, ADMM could not be applied to train RNN
directly since the state in the recurrent unit is repetitively updated over
timesteps. Therefore, this work builds a new framework named ADMMiRNN upon the
unfolded form of RNN to address the above challenges simultaneously. We also
provide novel update rules and theoretical convergence analysis. We explicitly
specify essential update rules in the iterations of ADMMiRNN with constructed
approximation techniques and solutions to each sub-problem instead of vanilla
ADMM. Numerical experiments are conducted on MNIST, IMDb, and text
classification tasks. ADMMiRNN achieves convergent results and outperforms the
compared baselines. Furthermore, ADMMiRNN trains RNN more stably without
gradient vanishing or exploding than stochastic gradient algorithms. We also
provide a distributed paralleled algorithm regarding ADMMiRNN, named
P-ADMMiRNN, including Synchronous Parallel ADMMiRNN (SP-ADMMiRNN) and
Asynchronous Parallel ADMMiRNN (AP-ADMMiRNN), which is the first to train RNN
with ADMM in an asynchronous parallel manner. The source code is publicly
available.
Related papers
- Adaptive-saturated RNN: Remember more with less instability [2.191505742658975]
This work proposes Adaptive-Saturated RNNs (asRNN), a variant that dynamically adjusts its saturation level between the two approaches.
Our experiments show encouraging results of asRNN on challenging sequence learning benchmarks compared to several strong competitors.
arXiv Detail & Related papers (2023-04-24T02:28:03Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Examining the Robustness of Spiking Neural Networks on Non-ideal
Memristive Crossbars [4.184276171116354]
Spiking Neural Networks (SNNs) have emerged as the low-power alternative to Artificial Neural Networks (ANNs)
We study the effect of crossbar non-idealities and intrinsicity on the performance of SNNs.
arXiv Detail & Related papers (2022-06-20T07:07:41Z) - Training High-Performance Low-Latency Spiking Neural Networks by
Differentiation on Spike Representation [70.75043144299168]
Spiking Neural Network (SNN) is a promising energy-efficient AI model when implemented on neuromorphic hardware.
It is a challenge to efficiently train SNNs due to their non-differentiability.
We propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance.
arXiv Detail & Related papers (2022-05-01T12:44:49Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - slimTrain -- A Stochastic Approximation Method for Training Separable
Deep Neural Networks [2.4373900721120285]
DeepTrain networks (DNNs) have shown their success as high-dimensional neural function approximators in many applications.
We propose slimTrain, a modest optimization method for training DNNs with reduced sensitivity to the choice hyper-dimensional datasets.
arXiv Detail & Related papers (2021-09-28T19:31:57Z) - Skip-Connected Self-Recurrent Spiking Neural Networks with Joint
Intrinsic Parameter and Synaptic Weight Training [14.992756670960008]
We propose a new type of RSNN called Skip-Connected Self-Recurrent SNNs (ScSr-SNNs)
ScSr-SNNs can boost performance by up to 2.55% compared with other types of RSNNs trained by state-of-the-art BP methods.
arXiv Detail & Related papers (2020-10-23T22:27:13Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z) - Frequentist Uncertainty in Recurrent Neural Networks via Blockwise
Influence Functions [121.10450359856242]
Recurrent neural networks (RNNs) are instrumental in modelling sequential and time-series data.
Existing approaches for uncertainty quantification in RNNs are based predominantly on Bayesian methods.
We develop a frequentist alternative that: (a) does not interfere with model training or compromise its accuracy, (b) applies to any RNN architecture, and (c) provides theoretical coverage guarantees on the estimated uncertainty intervals.
arXiv Detail & Related papers (2020-06-20T22:45:32Z) - MomentumRNN: Integrating Momentum into Recurrent Neural Networks [32.40217829362088]
We show that MomentumRNNs alleviate the vanishing gradient issue in training RNNs.
MomentumRNN is applicable to many types of recurrent cells, including those in the state-of-the-art RNNs.
We show that other advanced momentum-based optimization methods, such as Adam and Nesterov accelerated gradients with a restart, can be easily incorporated into the MomentumRNN framework.
arXiv Detail & Related papers (2020-06-12T03:02:29Z) - BLK-REW: A Unified Block-based DNN Pruning Framework using Reweighted
Regularization Method [69.49386965992464]
We propose a new block-based pruning framework that comprises a general and flexible structured pruning dimension as well as a powerful and efficient reweighted regularization method.
Our framework is universal, which can be applied to both CNNs and RNNs, implying complete support for the two major kinds ofintensive computation layers.
It is the first time that the weight pruning framework achieves universal coverage for both CNNs and RNNs with real-time mobile acceleration and no accuracy compromise.
arXiv Detail & Related papers (2020-01-23T03:30:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.