Predictive Coding beyond Gaussian Distributions
- URL: http://arxiv.org/abs/2211.03481v1
- Date: Mon, 7 Nov 2022 12:02:05 GMT
- Title: Predictive Coding beyond Gaussian Distributions
- Authors: Luca Pinchetti, Tommaso Salvatori, Yordan Yordanov, Beren Millidge,
Yuhang Song, Thomas Lukasiewicz
- Abstract summary: Predictive coding (PC) is a neuroscience-inspired method that performs inference on hierarchical Gaussian generative models.
These methods fail to keep up with modern neural networks, as they are unable to replicate the dynamics of complex layers and activation functions.
We show that our method allows us to train transformer networks and achieve a performance comparable with BP on conditional language models.
- Score: 38.51699576854394
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A large amount of recent research has the far-reaching goal of finding
training methods for deep neural networks that can serve as alternatives to
backpropagation (BP). A prominent example is predictive coding (PC), which is a
neuroscience-inspired method that performs inference on hierarchical Gaussian
generative models. These methods, however, fail to keep up with modern neural
networks, as they are unable to replicate the dynamics of complex layers and
activation functions. In this work, we solve this problem by generalizing PC to
arbitrary probability distributions, enabling the training of architectures,
such as transformers, that are hard to approximate with only Gaussian
assumptions. We perform three experimental analyses. First, we study the gap
between our method and the standard formulation of PC on multiple toy examples.
Second, we test the reconstruction quality on variational autoencoders, where
our method reaches the same reconstruction quality as BP. Third, we show that
our method allows us to train transformer networks and achieve a performance
comparable with BP on conditional language models. More broadly, this method
allows neuroscience-inspired learning to be applied to multiple domains, since
the internal distributions can be flexibly adapted to the data, tasks, and
architectures used.
Related papers
- Predictive Coding Networks and Inference Learning: Tutorial and Survey [0.7510165488300368]
Predictive coding networks (PCNs) are based on the neuroscientific framework of predictive coding.
Unlike traditional neural networks trained with backpropagation (BP), PCNs utilize inference learning (IL), a more biologically plausible algorithm.
As inherently probabilistic (graphical) latent variable models, PCNs provide a versatile framework for both supervised learning and unsupervised (generative) modeling.
arXiv Detail & Related papers (2024-07-04T18:39:20Z) - Transformers as Statisticians: Provable In-Context Learning with
In-Context Algorithm Selection [88.23337313766353]
This work first provides a comprehensive statistical theory for transformers to perform ICL.
We show that transformers can implement a broad class of standard machine learning algorithms in context.
A emphsingle transformer can adaptively select different base ICL algorithms.
arXiv Detail & Related papers (2023-06-07T17:59:31Z) - Quantum-Aided Meta-Learning for Bayesian Binary Neural Networks via Born
Machines [38.467834562966594]
This paper studies the use of Born machines for the problem of training binary Bayesian neural networks.
A Born machine is used to model the variational distribution of the binary weights of the neural network.
The method combines gradient-based meta-learning and variational inference via Born machines, and is shown to outperform conventional joint learning strategies.
arXiv Detail & Related papers (2022-03-31T15:09:04Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Analytically Tractable Bayesian Deep Q-Learning [0.0]
We adapt the temporal difference Q-learning framework to make it compatible with the tractable approximate Gaussian inference (TAGI)
We demonstrate that TAGI can reach a performance comparable to backpropagation-trained networks.
arXiv Detail & Related papers (2021-06-21T13:11:52Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - The Gaussian equivalence of generative models for learning with shallow
neural networks [30.47878306277163]
We study the performance of neural networks trained on data drawn from pre-trained generative models.
We provide three strands of rigorous, analytical and numerical evidence corroborating this equivalence.
These results open a viable path to the theoretical study of machine learning models with realistic data.
arXiv Detail & Related papers (2020-06-25T21:20:09Z) - Predictive Coding Approximates Backprop along Arbitrary Computation
Graphs [68.8204255655161]
We develop a strategy to translate core machine learning architectures into their predictive coding equivalents.
Our models perform equivalently to backprop on challenging machine learning benchmarks.
Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.
arXiv Detail & Related papers (2020-06-07T15:35:47Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.