Target Propagation via Regularized Inversion
- URL: http://arxiv.org/abs/2112.01453v1
- Date: Thu, 2 Dec 2021 17:49:25 GMT
- Title: Target Propagation via Regularized Inversion
- Authors: Vincent Roulet and Zaid Harchaoui
- Abstract summary: We present a simple version of target propagation based on regularized inversion of network layers, easily implementable in a differentiable programming framework.
We show how our TP can be used to train recurrent neural networks with long sequences on various sequence modeling problems.
- Score: 4.289574109162585
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Target Propagation (TP) algorithms compute targets instead of gradients along
neural networks and propagate them backward in a way that is similar yet
different than gradient back-propagation (BP). The idea was first presented as
a perturbative alternative to back-propagation that may achieve greater
accuracy in gradient evaluation when training multi-layer neural networks
(LeCun et al., 1989). However, TP has remained more of a template algorithm
with many variations than a well-identified algorithm. Revisiting insights of
LeCun et al., (1989) and more recently of Lee et al. (2015), we present a
simple version of target propagation based on regularized inversion of network
layers, easily implementable in a differentiable programming framework. We
compare its computational complexity to the one of BP and delineate the regimes
in which TP can be attractive compared to BP. We show how our TP can be used to
train recurrent neural networks with long sequences on various sequence
modeling problems. The experimental results underscore the importance of
regularization in TP in practice.
Related papers
- Gradient-Free Training of Recurrent Neural Networks using Random Perturbations [1.1742364055094265]
Recurrent neural networks (RNNs) hold immense potential for computations due to their Turing completeness and sequential processing capabilities.
Backpropagation through time (BPTT), the prevailing method, extends the backpropagation algorithm by unrolling the RNN over time.
BPTT suffers from significant drawbacks, including the need to interleave forward and backward phases and store exact gradient information.
We present a new approach to perturbation-based learning in RNNs whose performance is competitive with BPTT.
arXiv Detail & Related papers (2024-05-14T21:15:29Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Learning with Local Gradients at the Edge [14.94491070863641]
We present a novel backpropagation-free optimization algorithm dubbed Target Projection Gradient Descent (tpSGD)
tpSGD generalizes direct random target projection to work with arbitrary loss functions.
We evaluate the performance of tpSGD in training deep neural networks and extend the approach to multi-layer RNNs.
arXiv Detail & Related papers (2022-08-17T19:51:06Z) - A comparative study of back propagation and its alternatives on
multilayer perceptrons [0.0]
The de facto algorithm for training the back pass of a feedforward neural network is backpropagation (BP)
The use of almost-everywhere differentiable activation functions made it efficient and effective to propagate the gradient backwards through layers of deep neural networks.
In this paper, we analyze the stability and similarity of predictions and neurons in convolutional neural networks (CNNs) and propose a new variation of one of the algorithms.
arXiv Detail & Related papers (2022-05-31T18:44:13Z) - Mitigating Performance Saturation in Neural Marked Point Processes:
Architectures and Loss Functions [50.674773358075015]
We propose a simple graph-based network structure called GCHP, which utilizes only graph convolutional layers.
We show that GCHP can significantly reduce training time and the likelihood ratio loss with interarrival time probability assumptions can greatly improve the model performance.
arXiv Detail & Related papers (2021-07-07T16:59:14Z) - A Theoretical Framework for Target Propagation [75.52598682467817]
We analyze target propagation (TP), a popular but not yet fully understood alternative to backpropagation (BP)
Our theory shows that TP is closely related to Gauss-Newton optimization and thus substantially differs from BP.
We provide a first solution to this problem through a novel reconstruction loss that improves feedback weight training.
arXiv Detail & Related papers (2020-06-25T12:07:06Z) - Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing
its Gradient Estimator Bias [65.13042449121411]
In practice, training a network with the gradient estimates provided by EP does not scale to visual tasks harder than MNIST.
We show that a bias in the gradient estimate of EP, inherent in the use of finite nudging, is responsible for this phenomenon.
We apply these techniques to train an architecture with asymmetric forward and backward connections, yielding a 13.2% test error.
arXiv Detail & Related papers (2020-06-06T09:36:07Z) - Improving the Backpropagation Algorithm with Consequentialism Weight
Updates over Mini-Batches [0.40611352512781856]
We show that it is possible to consider a multi-layer neural network as a stack of adaptive filters.
We introduce a better algorithm by predicting then emending the adverse consequences of the actions that take place in BP even before they happen.
Our experiments show the usefulness of our algorithm in the training of deep neural networks.
arXiv Detail & Related papers (2020-03-11T08:45:36Z) - Semi-Implicit Back Propagation [1.5533842336139065]
We propose a semi-implicit back propagation method for neural network training.
The difference on the neurons are propagated in a backward fashion and the parameters are updated with proximal mapping.
Experiments on both MNIST and CIFAR-10 demonstrate that the proposed algorithm leads to better performance in terms of both loss decreasing and training/validation accuracy.
arXiv Detail & Related papers (2020-02-10T03:26:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.