Related papers: Predictive Coding beyond Gaussian Distributions

Predictive Coding beyond Gaussian Distributions

URL: http://arxiv.org/abs/2211.03481v1
Date: Mon, 7 Nov 2022 12:02:05 GMT
Title: Predictive Coding beyond Gaussian Distributions
Authors: Luca Pinchetti, Tommaso Salvatori, Yordan Yordanov, Beren Millidge, Yuhang Song, Thomas Lukasiewicz
Abstract summary: Predictive coding (PC) is a neuroscience-inspired method that performs inference on hierarchical Gaussian generative models. These methods fail to keep up with modern neural networks, as they are unable to replicate the dynamics of complex layers and activation functions. We show that our method allows us to train transformer networks and achieve a performance comparable with BP on conditional language models.
Score: 38.51699576854394
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A large amount of recent research has the far-reaching goal of finding training methods for deep neural networks that can serve as alternatives to backpropagation (BP). A prominent example is predictive coding (PC), which is a neuroscience-inspired method that performs inference on hierarchical Gaussian generative models. These methods, however, fail to keep up with modern neural networks, as they are unable to replicate the dynamics of complex layers and activation functions. In this work, we solve this problem by generalizing PC to arbitrary probability distributions, enabling the training of architectures, such as transformers, that are hard to approximate with only Gaussian assumptions. We perform three experimental analyses. First, we study the gap between our method and the standard formulation of PC on multiple toy examples. Second, we test the reconstruction quality on variational autoencoders, where our method reaches the same reconstruction quality as BP. Third, we show that our method allows us to train transformer networks and achieve a performance comparable with BP on conditional language models. More broadly, this method allows neuroscience-inspired learning to be applied to multiple domains, since the internal distributions can be flexibly adapted to the data, tasks, and architectures used.

Related papers

Universal Approximation Theorem for a Single-Layer Transformer [0.0]
Deep learning employs multi-layer neural networks trained via the backpropagation algorithm.<n>Transformers have achieved state-of-the-art performance in natural language processing.<n>We prove that a single-layer Transformer, comprising one self-attention layer followed by a position-wise feed-forward network with ReLU activation, can any continuous sequence-to-sequence mapping on a compact domain to arbitrary precision.
arXiv Detail & Related papers (2025-07-11T11:37:39Z)
Generative Diffusion Models for Resource Allocation in Wireless Networks [77.36145730415045]
We train a policy to imitate an expert and generate new samples from the optimal distribution. We achieve near-optimal performance through sequential execution of the generated samples. We present numerical results in a case study of power control in multi-user interference networks.
arXiv Detail & Related papers (2025-04-28T21:44:31Z)
EVaDE : Event-Based Variational Thompson Sampling for Model-Based Reinforcement Learning [13.322155764694275]
Posterior Sampling for Reinforcement Learning (PSRL) is an algorithm that augments model-based reinforcement learning algorithms with Thompson sampling. Recent works show that dropout, used in conjunction with neural networks, induces variational distributions that can approximate these posteriors. We propose Event-based Variational Distributions for Exploration (EVaDE), which are variational distributions that are useful for MBRL.
arXiv Detail & Related papers (2025-01-16T15:35:48Z)
Predictive Coding Networks and Inference Learning: Tutorial and Survey [0.7510165488300368]
Predictive coding networks (PCNs) are based on the neuroscientific framework of predictive coding. Unlike traditional neural networks trained with backpropagation (BP), PCNs utilize inference learning (IL), a more biologically plausible algorithm. As inherently probabilistic (graphical) latent variable models, PCNs provide a versatile framework for both supervised learning and unsupervised (generative) modeling.
arXiv Detail & Related papers (2024-07-04T18:39:20Z)
Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation [49.44309457870649]
Layer-wise Feedback feedback (LFP) is a novel training principle for neural network-like predictors.<n>LFP decomposes a reward to individual neurons based on their respective contributions.<n>Our method then implements a greedy reinforcing approach helpful parts of the network and weakening harmful ones.
arXiv Detail & Related papers (2023-08-23T10:48:28Z)
Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection [88.23337313766353]
This work first provides a comprehensive statistical theory for transformers to perform ICL. We show that transformers can implement a broad class of standard machine learning algorithms in context. A emphsingle transformer can adaptively select different base ICL algorithms.
arXiv Detail & Related papers (2023-06-07T17:59:31Z)
Quantum-Aided Meta-Learning for Bayesian Binary Neural Networks via Born Machines [38.467834562966594]
This paper studies the use of Born machines for the problem of training binary Bayesian neural networks. A Born machine is used to model the variational distribution of the binary weights of the neural network. The method combines gradient-based meta-learning and variational inference via Born machines, and is shown to outperform conventional joint learning strategies.
arXiv Detail & Related papers (2022-03-31T15:09:04Z)
Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states. Our method is widely applicable to classical DP-based inference. It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z)
Analytically Tractable Bayesian Deep Q-Learning [0.0]
We adapt the temporal difference Q-learning framework to make it compatible with the tractable approximate Gaussian inference (TAGI) We demonstrate that TAGI can reach a performance comparable to backpropagation-trained networks.
arXiv Detail & Related papers (2021-06-21T13:11:52Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
The Gaussian equivalence of generative models for learning with shallow neural networks [30.47878306277163]
We study the performance of neural networks trained on data drawn from pre-trained generative models. We provide three strands of rigorous, analytical and numerical evidence corroborating this equivalence. These results open a viable path to the theoretical study of machine learning models with realistic data.
arXiv Detail & Related papers (2020-06-25T21:20:09Z)
Predictive Coding Approximates Backprop along Arbitrary Computation Graphs [68.8204255655161]
We develop a strategy to translate core machine learning architectures into their predictive coding equivalents. Our models perform equivalently to backprop on challenging machine learning benchmarks. Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.
arXiv Detail & Related papers (2020-06-07T15:35:47Z)
Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model. This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs) The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.