Predictive Coding Approximates Backprop along Arbitrary Computation
Graphs
- URL: http://arxiv.org/abs/2006.04182v5
- Date: Mon, 5 Oct 2020 18:11:05 GMT
- Title: Predictive Coding Approximates Backprop along Arbitrary Computation
Graphs
- Authors: Beren Millidge, Alexander Tschantz, Christopher L. Buckley
- Abstract summary: We develop a strategy to translate core machine learning architectures into their predictive coding equivalents.
Our models perform equivalently to backprop on challenging machine learning benchmarks.
Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.
- Score: 68.8204255655161
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Backpropagation of error (backprop) is a powerful algorithm for training
machine learning architectures through end-to-end differentiation. However,
backprop is often criticised for lacking biological plausibility. Recently, it
has been shown that backprop in multilayer-perceptrons (MLPs) can be
approximated using predictive coding, a biologically-plausible process theory
of cortical computation which relies only on local and Hebbian updates. The
power of backprop, however, lies not in its instantiation in MLPs, but rather
in the concept of automatic differentiation which allows for the optimisation
of any differentiable program expressed as a computation graph. Here, we
demonstrate that predictive coding converges asymptotically (and in practice
rapidly) to exact backprop gradients on arbitrary computation graphs using only
local learning rules. We apply this result to develop a straightforward
strategy to translate core machine learning architectures into their predictive
coding equivalents. We construct predictive coding CNNs, RNNs, and the more
complex LSTMs, which include a non-layer-like branching internal graph
structure and multiplicative interactions. Our models perform equivalently to
backprop on challenging machine learning benchmarks, while utilising only local
and (mostly) Hebbian plasticity. Our method raises the potential that standard
machine learning algorithms could in principle be directly implemented in
neural circuitry, and may also contribute to the development of completely
distributed neuromorphic architectures.
Related papers
- pyhgf: A neural network library for predictive coding [0.2150989251218736]
texttpyhgf is a Python package for creating, manipulating and sampling dynamic networks for predictive coding.
We improve over other frameworks by enclosing the network components as transparent, modular and malleable variables in the message-passing steps.
The transparency of core variables can also translate into inference processes that leverage self-organisation principles.
arXiv Detail & Related papers (2024-10-11T19:21:38Z) - Predictive Coding beyond Gaussian Distributions [38.51699576854394]
Predictive coding (PC) is a neuroscience-inspired method that performs inference on hierarchical Gaussian generative models.
These methods fail to keep up with modern neural networks, as they are unable to replicate the dynamics of complex layers and activation functions.
We show that our method allows us to train transformer networks and achieve a performance comparable with BP on conditional language models.
arXiv Detail & Related papers (2022-11-07T12:02:05Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Inducing Gaussian Process Networks [80.40892394020797]
We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points.
The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains.
We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
arXiv Detail & Related papers (2022-04-21T05:27:09Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Relaxing the Constraints on Predictive Coding Models [62.997667081978825]
Predictive coding is an influential theory of cortical function which posits that the principal computation the brain performs is the minimization of prediction errors.
Standard implementations of the algorithm still involve potentially neurally implausible features such as identical forward and backward weights, backward nonlinear derivatives, and 1-1 error unit connectivity.
In this paper, we show that these features are not integral to the algorithm and can be removed either directly or through learning additional sets of parameters with Hebbian update rules without noticeable harm to learning performance.
arXiv Detail & Related papers (2020-10-02T15:21:37Z) - Activation Relaxation: A Local Dynamical Approximation to
Backpropagation in the Brain [62.997667081978825]
Activation Relaxation (AR) is motivated by constructing the backpropagation gradient as the equilibrium point of a dynamical system.
Our algorithm converges rapidly and robustly to the correct backpropagation gradients, requires only a single type of computational unit, and can operate on arbitrary computation graphs.
arXiv Detail & Related papers (2020-09-11T11:56:34Z) - Randomized Automatic Differentiation [22.95414996614006]
We develop a general framework and approach for randomized automatic differentiation (RAD)
RAD can allow unbiased estimates to be computed with reduced memory in return for variance.
We show that RAD converges in fewer iterations than using a small batch size for feedforward networks, and in a similar number for recurrent networks.
arXiv Detail & Related papers (2020-07-20T19:03:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.