Low-Variance Forward Gradients using Direct Feedback Alignment and
Momentum
- URL: http://arxiv.org/abs/2212.07282v4
- Date: Mon, 21 Aug 2023 17:17:26 GMT
- Title: Low-Variance Forward Gradients using Direct Feedback Alignment and
Momentum
- Authors: Florian Bacho and Dominique Chu
- Abstract summary: We propose an algorithm that combines Activity-Perturbed Forward Gradients with Direct Feedback Alignment and momentum.
Our approach enables faster convergence and better performance when compared to other local alternatives to backpropagation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Supervised learning in deep neural networks is commonly performed using error
backpropagation. However, the sequential propagation of errors during the
backward pass limits its scalability and applicability to low-powered
neuromorphic hardware. Therefore, there is growing interest in finding local
alternatives to backpropagation. Recently proposed methods based on
forward-mode automatic differentiation suffer from high variance in large deep
neural networks, which affects convergence. In this paper, we propose the
Forward Direct Feedback Alignment algorithm that combines Activity-Perturbed
Forward Gradients with Direct Feedback Alignment and momentum. We provide both
theoretical proofs and empirical evidence that our proposed method achieves
lower variance than forward gradient techniques. In this way, our approach
enables faster convergence and better performance when compared to other local
alternatives to backpropagation and opens a new perspective for the development
of online learning algorithms compatible with neuromorphic systems.
Related papers
- Correlations Are Ruining Your Gradient Descent [1.2432046687586285]
Natural gradient descent illuminates how gradient vectors, pointing at directions of steepest descent, can be improved by considering the local curvature of loss landscapes.
We show that correlations in the data at any linear transformation, including node responses at every layer of a neural network, cause a non-orthonormal relationship between the model's parameters.
We describe a range of methods which have been proposed for decorrelation and whitening of node output, and expand on these to provide a novel method specifically useful for distributed computing and computational neuroscience.
arXiv Detail & Related papers (2024-07-15T14:59:43Z) - Improving Generalization of Deep Neural Networks by Optimum Shifting [33.092571599896814]
We propose a novel method called emphoptimum shifting, which changes the parameters of a neural network from a sharp minimum to a flatter one.
Our method is based on the observation that when the input and output of a neural network are fixed, the matrix multiplications within the network can be treated as systems of under-determined linear equations.
arXiv Detail & Related papers (2024-05-23T02:31:55Z) - A Novel Method for improving accuracy in neural network by reinstating
traditional back propagation technique [0.0]
We propose a novel instant parameter update methodology that eliminates the need for computing gradients at each layer.
Our approach accelerates learning, avoids the vanishing gradient problem, and outperforms state-of-the-art methods on benchmark data sets.
arXiv Detail & Related papers (2023-08-09T16:41:00Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - A comparative study of back propagation and its alternatives on
multilayer perceptrons [0.0]
The de facto algorithm for training the back pass of a feedforward neural network is backpropagation (BP)
The use of almost-everywhere differentiable activation functions made it efficient and effective to propagate the gradient backwards through layers of deep neural networks.
In this paper, we analyze the stability and similarity of predictions and neurons in convolutional neural networks (CNNs) and propose a new variation of one of the algorithms.
arXiv Detail & Related papers (2022-05-31T18:44:13Z) - Non-Gradient Manifold Neural Network [79.44066256794187]
Deep neural network (DNN) generally takes thousands of iterations to optimize via gradient descent.
We propose a novel manifold neural network based on non-gradient optimization.
arXiv Detail & Related papers (2021-06-15T06:39:13Z) - Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem)
AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient.
Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z) - An Ode to an ODE [78.97367880223254]
We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the group O(d)
This nested system of two flows provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem.
arXiv Detail & Related papers (2020-06-19T22:05:19Z) - Parallelization Techniques for Verifying Neural Networks [52.917845265248744]
We introduce an algorithm based on the verification problem in an iterative manner and explore two partitioning strategies.
We also introduce a highly parallelizable pre-processing algorithm that uses the neuron activation phases to simplify the neural network verification problems.
arXiv Detail & Related papers (2020-04-17T20:21:47Z) - Disentangling Adaptive Gradient Methods from Learning Rates [65.0397050979662]
We take a deeper look at how adaptive gradient methods interact with the learning rate schedule.
We introduce a "grafting" experiment which decouples an update's magnitude from its direction.
We present some empirical and theoretical retrospectives on the generalization of adaptive gradient methods.
arXiv Detail & Related papers (2020-02-26T21:42:49Z) - Semi-Implicit Back Propagation [1.5533842336139065]
We propose a semi-implicit back propagation method for neural network training.
The difference on the neurons are propagated in a backward fashion and the parameters are updated with proximal mapping.
Experiments on both MNIST and CIFAR-10 demonstrate that the proposed algorithm leads to better performance in terms of both loss decreasing and training/validation accuracy.
arXiv Detail & Related papers (2020-02-10T03:26:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.