A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive
Coding Networks
- URL: http://arxiv.org/abs/2212.00720v2
- Date: Wed, 7 Feb 2024 13:01:24 GMT
- Title: A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive
Coding Networks
- Authors: Tommaso Salvatori, Yuhang Song, Yordan Yordanov, Beren Millidge,
Zhenghua Xu, Lei Sha, Cornelius Emde, Rafal Bogacz, Thomas Lukasiewicz
- Abstract summary: Predictive coding networks are neuroscience-inspired models with roots in both Bayesian statistics and neuroscience.
We show how by simply changing the temporal scheduling of the update rule for the synaptic weights leads to an algorithm that is much more efficient and stable than the original one.
- Score: 65.34977803841007
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predictive coding networks are neuroscience-inspired models with roots in
both Bayesian statistics and neuroscience. Training such models, however, is
quite inefficient and unstable. In this work, we show how by simply changing
the temporal scheduling of the update rule for the synaptic weights leads to an
algorithm that is much more efficient and stable than the original one, and has
theoretical guarantees in terms of convergence. The proposed algorithm, that we
call incremental predictive coding (iPC) is also more biologically plausible
than the original one, as it it fully automatic. In an extensive set of
experiments, we show that iPC constantly performs better than the original
formulation on a large number of benchmarks for image classification, as well
as for the training of both conditional and masked language models, in terms of
test accuracy, efficiency, and convergence with respect to a large set of
hyperparameters.
Related papers
- Regularized Adaptive Momentum Dual Averaging with an Efficient Inexact Subproblem Solver for Training Structured Neural Network [9.48424754175943]
We propose a Regularized Adaptive Momentum Dual Averaging (RAMDA) for training structured neural networks.
We show that RAMDA attains the ideal structure induced by the regularizer at the stationary point of convergence.
Experiments in large-scale modern computer vision, language modeling, and speech tasks show that the proposed RAMDA is efficient and consistently outperforms state of the art for training structured neural network.
arXiv Detail & Related papers (2024-03-21T13:43:49Z) - qecGPT: decoding Quantum Error-correcting Codes with Generative
Pre-trained Transformers [5.392298820599664]
We propose a framework for decoding quantum error-correcting codes with generative modeling.
We use autoregressive neural networks, specifically Transformers, to learn the joint probability of logical operators and syndromes.
Our framework is general and can be applied to any error model and quantum codes with different topologies.
arXiv Detail & Related papers (2023-07-18T07:34:02Z) - Towards Theoretically Inspired Neural Initialization Optimization [66.04735385415427]
We propose a differentiable quantity, named GradCosine, with theoretical insights to evaluate the initial state of a neural network.
We show that both the training and test performance of a network can be improved by maximizing GradCosine under norm constraint.
Generalized from the sample-wise analysis into the real batch setting, NIO is able to automatically look for a better initialization with negligible cost.
arXiv Detail & Related papers (2022-10-12T06:49:16Z) - Robust Learning of Parsimonious Deep Neural Networks [0.0]
We propose a simultaneous learning and pruning algorithm capable of identifying and eliminating irrelevant structures in a neural network.
We derive a novel hyper-prior distribution over the prior parameters that is crucial for their optimal selection.
We evaluate the proposed algorithm on the MNIST data set and commonly used fully connected and convolutional LeNet architectures.
arXiv Detail & Related papers (2022-05-10T03:38:55Z) - Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples.
We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment.
We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z) - LCS: Learning Compressible Subspaces for Adaptive Network Compression at
Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models.
We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity.
Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z) - Convolutional Sparse Coding Fast Approximation with Application to
Seismic Reflectivity Estimation [9.005280130480308]
We propose a speed-up upgraded version of the classic iterative thresholding algorithm, that produces a good approximation of the convolutional sparse code within 2-5 iterations.
The performance of the proposed solution is demonstrated via the seismic inversion problem in both synthetic and real data scenarios.
arXiv Detail & Related papers (2021-06-29T12:19:07Z) - Predictive Coding Approximates Backprop along Arbitrary Computation
Graphs [68.8204255655161]
We develop a strategy to translate core machine learning architectures into their predictive coding equivalents.
Our models perform equivalently to backprop on challenging machine learning benchmarks.
Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.
arXiv Detail & Related papers (2020-06-07T15:35:47Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.