Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing
its Gradient Estimator Bias
- URL: http://arxiv.org/abs/2006.03824v1
- Date: Sat, 6 Jun 2020 09:36:07 GMT
- Title: Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing
its Gradient Estimator Bias
- Authors: Axel Laborieux, Maxence Ernoult, Benjamin Scellier, Yoshua Bengio,
Julie Grollier and Damien Querlioz
- Abstract summary: In practice, training a network with the gradient estimates provided by EP does not scale to visual tasks harder than MNIST.
We show that a bias in the gradient estimate of EP, inherent in the use of finite nudging, is responsible for this phenomenon.
We apply these techniques to train an architecture with asymmetric forward and backward connections, yielding a 13.2% test error.
- Score: 65.13042449121411
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Equilibrium Propagation (EP) is a biologically-inspired algorithm for
convergent RNNs with a local learning rule that comes with strong theoretical
guarantees. The parameter updates of the neural network during the credit
assignment phase have been shown mathematically to approach the gradients
provided by Backpropagation Through Time (BPTT) when the network is
infinitesimally nudged toward its target. In practice, however, training a
network with the gradient estimates provided by EP does not scale to visual
tasks harder than MNIST. In this work, we show that a bias in the gradient
estimate of EP, inherent in the use of finite nudging, is responsible for this
phenomenon and that cancelling it allows training deep ConvNets by EP. We show
that this bias can be greatly reduced by using symmetric nudging (a positive
nudging and a negative one). We also generalize previous EP equations to the
case of cross-entropy loss (by opposition to squared error). As a result of
these advances, we are able to achieve a test error of 11.7% on CIFAR-10 by EP,
which approaches the one achieved by BPTT and provides a major improvement with
respect to the standard EP approach with same-sign nudging that gives 86% test
error. We also apply these techniques to train an architecture with asymmetric
forward and backward connections, yielding a 13.2% test error. These results
highlight EP as a compelling biologically-plausible approach to compute error
gradients in deep neural networks.
Related papers
- Scaling SNNs Trained Using Equilibrium Propagation to Convolutional Architectures [2.2146860305758485]
Equilibrium Propagation (EP) is a biologically plausible local learning algorithm initially developed for convergent recurrent neural networks (RNNs)
EP is a powerful candidate for training Spiking Neural Networks (SNNs), which are commonly trained by BPTT.
We provide a formulation for training convolutional spiking convergent RNNs using EP, bridging the gap between spiking and non-spiking convergent RNNs.
arXiv Detail & Related papers (2024-05-04T03:06:14Z) - Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs.
We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD.
We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z) - Robust Learning via Persistency of Excitation [4.674053902991301]
We show that network training using gradient descent is equivalent to a dynamical system parameter estimation problem.
We provide an efficient technique for estimating the corresponding Lipschitz constant using extreme value theory.
Our approach also universally increases the adversarial accuracy by 0.1% to 0.3% points in various state-of-the-art adversarially trained models.
arXiv Detail & Related papers (2021-06-03T18:49:05Z) - Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing
its Gradient Estimator Bias [62.43908463620527]
In practice, EP does not scale to visual tasks harder than MNIST.
We show that a bias in the gradient estimate of EP, inherent in the use of finite nudging, is responsible for this phenomenon.
These results highlight EP as a scalable approach to compute error gradients in deep neural networks, thereby motivating its hardware implementation.
arXiv Detail & Related papers (2021-01-14T10:23:40Z) - Implicit Under-Parameterization Inhibits Data-Efficient Deep
Reinforcement Learning [97.28695683236981]
More gradient updates decrease the expressivity of the current value network.
We demonstrate this phenomenon on Atari and Gym benchmarks, in both offline and online RL settings.
arXiv Detail & Related papers (2020-10-27T17:55:16Z) - Inductive Bias of Gradient Descent for Exponentially Weight Normalized
Smooth Homogeneous Neural Nets [1.7259824817932292]
We analyze the inductive bias of gradient descent for weight normalized smooth homogeneous neural nets, when trained on exponential or cross-entropy loss.
This paper shows that the gradient flow path with EWN is equivalent to gradient flow on standard networks with an adaptive learning rate.
arXiv Detail & Related papers (2020-10-24T14:34:56Z) - PEP: Parameter Ensembling by Perturbation [13.221295194854642]
Ensembling by Perturbation (PEP) constructs an ensemble of parameter values as random perturbations of the optimal parameter set from training.
PEP provides a small improvement in performance, and, in some cases, a substantial improvement in empirical calibration.
PEP can be used to probe the level of overfitting that occurred during training.
arXiv Detail & Related papers (2020-10-24T00:16:03Z) - Unbiased Risk Estimators Can Mislead: A Case Study of Learning with
Complementary Labels [92.98756432746482]
We study a weakly supervised problem called learning with complementary labels.
We show that the quality of gradient estimation matters more in risk minimization.
We propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance.
arXiv Detail & Related papers (2020-07-05T04:19:37Z) - Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks [65.24701908364383]
We show that a sufficient condition for a uncertainty on a ReLU network is "to be a bit Bayesian calibrated"
We further validate these findings empirically via various standard experiments using common deep ReLU networks and Laplace approximations.
arXiv Detail & Related papers (2020-02-24T08:52:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.