Inertial Proximal Deep Learning Alternating Minimization for Efficient
Neutral Network Training
- URL: http://arxiv.org/abs/2102.00267v1
- Date: Sat, 30 Jan 2021 16:40:08 GMT
- Title: Inertial Proximal Deep Learning Alternating Minimization for Efficient
Neutral Network Training
- Authors: Linbo Qiao, Tao Sun, Hengyue Pan, Dongsheng Li
- Abstract summary: This work develops an improved DLAM by the well-known inertial technique, namely iPDLAM, which predicts a point by linearization of current and last iterates.
Numerical results on real-world datasets are reported to demonstrate the efficiency of our proposed algorithm.
- Score: 16.165369437324266
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, the Deep Learning Alternating Minimization (DLAM), which is
actually the alternating minimization applied to the penalty form of the deep
neutral networks training, has been developed as an alternative algorithm to
overcome several drawbacks of Stochastic Gradient Descent (SGD) algorithms.
This work develops an improved DLAM by the well-known inertial technique,
namely iPDLAM, which predicts a point by linearization of current and last
iterates. To obtain further training speed, we apply a warm-up technique to the
penalty parameter, that is, starting with a small initial one and increasing it
in the iterations. Numerical results on real-world datasets are reported to
demonstrate the efficiency of our proposed algorithm.
Related papers
- Gradient-Free Training of Recurrent Neural Networks using Random Perturbations [1.1742364055094265]
Recurrent neural networks (RNNs) hold immense potential for computations due to their Turing completeness and sequential processing capabilities.
Backpropagation through time (BPTT), the prevailing method, extends the backpropagation algorithm by unrolling the RNN over time.
BPTT suffers from significant drawbacks, including the need to interleave forward and backward phases and store exact gradient information.
We present a new approach to perturbation-based learning in RNNs whose performance is competitive with BPTT.
arXiv Detail & Related papers (2024-05-14T21:15:29Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Solving Sparse Linear Inverse Problems in Communication Systems: A Deep
Learning Approach With Adaptive Depth [51.40441097625201]
We propose an end-to-end trainable deep learning architecture for sparse signal recovery problems.
The proposed method learns how many layers to execute to emit an output, and the network depth is dynamically adjusted for each task in the inference phase.
arXiv Detail & Related papers (2020-10-29T06:32:53Z) - Taming neural networks with TUSLA: Non-convex learning via adaptive
stochastic gradient Langevin algorithms [0.0]
We offer on an appropriately constructed gradient algorithm based on problematic Lange dynamics (SGLD)
We also provide nonasymptotic analysis of the use of new algorithm's convergence properties.
The roots of the TUSLA algorithm are based on the taming processes with developed coefficients citettamed-euler.
arXiv Detail & Related papers (2020-06-25T16:06:22Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Tune smarter not harder: A principled approach to tuning learning rates
for shallow nets [13.203765985718201]
principled approach to choosing the learning rate is proposed for shallow feedforward neural networks.
It is shown through simulations that the proposed search method significantly outperforms the existing tuning methods.
arXiv Detail & Related papers (2020-03-22T09:38:35Z) - DDPNOpt: Differential Dynamic Programming Neural Optimizer [29.82841891919951]
We show that most widely-used algorithms for trainings can be linked to the Differential Dynamic Programming (DDP)
In this vein, we propose a new class of DDPOpt, for training feedforward and convolution networks.
arXiv Detail & Related papers (2020-02-20T15:42:15Z) - Semi-Implicit Back Propagation [1.5533842336139065]
We propose a semi-implicit back propagation method for neural network training.
The difference on the neurons are propagated in a backward fashion and the parameters are updated with proximal mapping.
Experiments on both MNIST and CIFAR-10 demonstrate that the proposed algorithm leads to better performance in terms of both loss decreasing and training/validation accuracy.
arXiv Detail & Related papers (2020-02-10T03:26:09Z) - Large Batch Training Does Not Need Warmup [111.07680619360528]
Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications.
In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training.
Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques.
arXiv Detail & Related papers (2020-02-04T23:03:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.