Semi-Implicit Back Propagation
- URL: http://arxiv.org/abs/2002.03516v1
- Date: Mon, 10 Feb 2020 03:26:09 GMT
- Title: Semi-Implicit Back Propagation
- Authors: Ren Liu, Xiaoqun Zhang
- Abstract summary: We propose a semi-implicit back propagation method for neural network training.
The difference on the neurons are propagated in a backward fashion and the parameters are updated with proximal mapping.
Experiments on both MNIST and CIFAR-10 demonstrate that the proposed algorithm leads to better performance in terms of both loss decreasing and training/validation accuracy.
- Score: 1.5533842336139065
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural network has attracted great attention for a long time and many
researchers are devoted to improve the effectiveness of neural network training
algorithms. Though stochastic gradient descent (SGD) and other explicit
gradient-based methods are widely adopted, there are still many challenges such
as gradient vanishing and small step sizes, which leads to slow convergence and
instability of SGD algorithms. Motivated by error back propagation (BP) and
proximal methods, we propose a semi-implicit back propagation method for neural
network training. Similar to BP, the difference on the neurons are propagated
in a backward fashion and the parameters are updated with proximal mapping. The
implicit update for both hidden neurons and parameters allows to choose large
step size in the training algorithm. Finally, we also show that any fixed point
of convergent sequences produced by this algorithm is a stationary point of the
objective loss function. The experiments on both MNIST and CIFAR-10 demonstrate
that the proposed semi-implicit BP algorithm leads to better performance in
terms of both loss decreasing and training/validation accuracy, compared to SGD
and a similar algorithm ProxBP.
Related papers
- Gradient-Free Training of Recurrent Neural Networks using Random Perturbations [1.1742364055094265]
Recurrent neural networks (RNNs) hold immense potential for computations due to their Turing completeness and sequential processing capabilities.
Backpropagation through time (BPTT), the prevailing method, extends the backpropagation algorithm by unrolling the RNN over time.
BPTT suffers from significant drawbacks, including the need to interleave forward and backward phases and store exact gradient information.
We present a new approach to perturbation-based learning in RNNs whose performance is competitive with BPTT.
arXiv Detail & Related papers (2024-05-14T21:15:29Z) - A Bootstrap Algorithm for Fast Supervised Learning [0.0]
Training a neural network (NN) typically relies on some type of curve-following method, such as gradient descent (and gradient descent (SGD)), ADADELTA, ADAM or limited memory algorithms.
Convergence for these algorithms usually relies on having access to a large quantity of observations in order to achieve a high level of accuracy and, with certain classes of functions, these algorithms could take multiple epochs of data points to catch on.
Herein, a different technique with the potential of achieving dramatically better speeds of convergence is explored: it does not curve-follow but rather relies on 'decoupling' hidden layers and on
arXiv Detail & Related papers (2023-05-04T18:28:18Z) - Membrane Potential Distribution Adjustment and Parametric Surrogate
Gradient in Spiking Neural Networks [3.485537704990941]
Surrogate gradient (SG) strategy is investigated and applied to circumvent this issue and train SNNs from scratch.
We propose the parametric surrogate gradient (PSG) method to iteratively update SG and eventually determine an optimal surrogate gradient parameter.
Experimental results demonstrate that the proposed methods can be readily integrated with backpropagation through time (BPTT) algorithm.
arXiv Detail & Related papers (2023-04-26T05:02:41Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Learning with Local Gradients at the Edge [14.94491070863641]
We present a novel backpropagation-free optimization algorithm dubbed Target Projection Gradient Descent (tpSGD)
tpSGD generalizes direct random target projection to work with arbitrary loss functions.
We evaluate the performance of tpSGD in training deep neural networks and extend the approach to multi-layer RNNs.
arXiv Detail & Related papers (2022-08-17T19:51:06Z) - Solving Sparse Linear Inverse Problems in Communication Systems: A Deep
Learning Approach With Adaptive Depth [51.40441097625201]
We propose an end-to-end trainable deep learning architecture for sparse signal recovery problems.
The proposed method learns how many layers to execute to emit an output, and the network depth is dynamically adjusted for each task in the inference phase.
arXiv Detail & Related papers (2020-10-29T06:32:53Z) - Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem)
AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient.
Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.