Related papers: A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive Coding Networks

A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive Coding Networks

URL: http://arxiv.org/abs/2212.00720v2
Date: Wed, 7 Feb 2024 13:01:24 GMT
Title: A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive Coding Networks
Authors: Tommaso Salvatori, Yuhang Song, Yordan Yordanov, Beren Millidge, Zhenghua Xu, Lei Sha, Cornelius Emde, Rafal Bogacz, Thomas Lukasiewicz
Abstract summary: Predictive coding networks are neuroscience-inspired models with roots in both Bayesian statistics and neuroscience. We show how by simply changing the temporal scheduling of the update rule for the synaptic weights leads to an algorithm that is much more efficient and stable than the original one.
Score: 65.34977803841007
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Predictive coding networks are neuroscience-inspired models with roots in both Bayesian statistics and neuroscience. Training such models, however, is quite inefficient and unstable. In this work, we show how by simply changing the temporal scheduling of the update rule for the synaptic weights leads to an algorithm that is much more efficient and stable than the original one, and has theoretical guarantees in terms of convergence. The proposed algorithm, that we call incremental predictive coding (iPC) is also more biologically plausible than the original one, as it it fully automatic. In an extensive set of experiments, we show that iPC constantly performs better than the original formulation on a large number of benchmarks for image classification, as well as for the training of both conditional and masked language models, in terms of test accuracy, efficiency, and convergence with respect to a large set of hyperparameters.

Related papers

A Stable Whitening Optimizer for Efficient Neural Network Training [101.89246340672246]
Building on the Shampoo family of algorithms, we identify and alleviate three key issues, resulting in the proposed SPlus method.<n>First, we find that naive Shampoo is prone to divergence when matrix-inverses are cached for long periods.<n>Second, we adapt a shape-aware scaling to enable learning rate transfer across network width.<n>Third, we find that high learning rates result in large parameter noise, and propose a simple iterate-averaging scheme which unblocks faster learning.
arXiv Detail & Related papers (2025-06-08T18:43:31Z)
Inference Acceleration of Autoregressive Normalizing Flows by Selective Jacobi Decoding [12.338918067455436]
Normalizing flows are promising generative models with advantages such as theoretical rigor, analytical log-likelihood, and end-to-end training.<n>Recent advancements utilize autoregressive modeling, significantly enhancing expressive power and generation quality.<n>We propose a selective Jacobi decoding (SeJD) strategy that accelerates autoregressive inference through parallel iterative optimization.
arXiv Detail & Related papers (2025-05-30T16:53:15Z)
Consistency-based Abductive Reasoning over Perceptual Errors of Multiple Pre-trained Models in Novel Environments [5.5855749614100825]
This paper addresses the hypothesis that leveraging multiple pre-trained models can mitigate this recall reduction.<n>We formulate the challenge of identifying and managing conflicting predictions from various models as a consistency-based abduction problem.<n>Our results validate the use of consistency-based abduction as an effective mechanism to robustly integrate knowledge from multiple imperfect models in challenging, novel scenarios.
arXiv Detail & Related papers (2025-05-25T23:17:47Z)
Regularized Adaptive Momentum Dual Averaging with an Efficient Inexact Subproblem Solver for Training Structured Neural Network [9.48424754175943]
We propose a Regularized Adaptive Momentum Dual Averaging (RAMDA) for training structured neural networks. We show that RAMDA attains the ideal structure induced by the regularizer at the stationary point of convergence. Experiments in large-scale modern computer vision, language modeling, and speech tasks show that the proposed RAMDA is efficient and consistently outperforms state of the art for training structured neural network.
arXiv Detail & Related papers (2024-03-21T13:43:49Z)
qecGPT: decoding Quantum Error-correcting Codes with Generative Pre-trained Transformers [5.392298820599664]
We propose a framework for decoding quantum error-correcting codes with generative modeling. We use autoregressive neural networks, specifically Transformers, to learn the joint probability of logical operators and syndromes. Our framework is general and can be applied to any error model and quantum codes with different topologies.
arXiv Detail & Related papers (2023-07-18T07:34:02Z)
Towards Theoretically Inspired Neural Initialization Optimization [66.04735385415427]
We propose a differentiable quantity, named GradCosine, with theoretical insights to evaluate the initial state of a neural network. We show that both the training and test performance of a network can be improved by maximizing GradCosine under norm constraint. Generalized from the sample-wise analysis into the real batch setting, NIO is able to automatically look for a better initialization with negligible cost.
arXiv Detail & Related papers (2022-10-12T06:49:16Z)
Robust Learning of Parsimonious Deep Neural Networks [0.0]
We propose a simultaneous learning and pruning algorithm capable of identifying and eliminating irrelevant structures in a neural network. We derive a novel hyper-prior distribution over the prior parameters that is crucial for their optimal selection. We evaluate the proposed algorithm on the MNIST data set and commonly used fully connected and convolutional LeNet architectures.
arXiv Detail & Related papers (2022-05-10T03:38:55Z)
Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples. We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment. We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z)
LCS: Learning Compressible Subspaces for Adaptive Network Compression at Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models. We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity. Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z)
Convolutional Sparse Coding Fast Approximation with Application to Seismic Reflectivity Estimation [9.005280130480308]
We propose a speed-up upgraded version of the classic iterative thresholding algorithm, that produces a good approximation of the convolutional sparse code within 2-5 iterations. The performance of the proposed solution is demonstrated via the seismic inversion problem in both synthetic and real data scenarios.
arXiv Detail & Related papers (2021-06-29T12:19:07Z)
Predictive Coding Approximates Backprop along Arbitrary Computation Graphs [68.8204255655161]
We develop a strategy to translate core machine learning architectures into their predictive coding equivalents. Our models perform equivalently to backprop on challenging machine learning benchmarks. Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.
arXiv Detail & Related papers (2020-06-07T15:35:47Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.