On the Infinite Width and Depth Limits of Predictive Coding Networks
- URL: http://arxiv.org/abs/2602.07697v1
- Date: Sat, 07 Feb 2026 20:47:32 GMT
- Title: On the Infinite Width and Depth Limits of Predictive Coding Networks
- Authors: Francesco Innocenti, El Mehdi Achour, Rafal Bogacz,
- Abstract summary: Predictive coding (PC) is a biologically plausible alternative to standard backpropagation (BP)<n>Recent work has improved the training stability of deep PC networks.<n>We study the infinite width and depth limits of PCNs.
- Score: 8.779034498638826
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Predictive coding (PC) is a biologically plausible alternative to standard backpropagation (BP) that minimises an energy function with respect to network activities before updating weights. Recent work has improved the training stability of deep PC networks (PCNs) by leveraging some BP-inspired reparameterisations. However, the full scalability and theoretical basis of these approaches remains unclear. To address this, we study the infinite width and depth limits of PCNs. For linear residual networks, we show that the set of width- and depth-stable feature-learning parameterisations for PC is exactly the same as for BP. Moreover, under any of these parameterisations, the PC energy with equilibrated activities converges to the BP loss in a regime where the model width is much larger than the depth, resulting in PC computing the same gradients as BP. Experiments show that these results hold in practice for deep nonlinear networks, as long as an activity equilibrium seem to be reached. Overall, this work unifies various previous theoretical and empirical results and has potentially important implications for the scaling of PCNs.
Related papers
- Towards Scaling Deep Neural Networks with Predictive Coding: Theory and Practice [1.2691047660244335]
Backpropagation (BP) is the standard algorithm for training the deep neural networks that power modern artificial intelligence.<n>This thesis studies an alternative, potentially more efficient brain-inspired algorithm called predictive coding (PC)
arXiv Detail & Related papers (2025-10-24T14:47:49Z) - Local Loss Optimization in the Infinite Width: Stable Parameterization of Predictive Coding Networks and Target Propagation [8.35644084613785]
We introduce the maximal update parameterization ($mu$P) in the infinite-width limit for two representative designs of local targets.<n>By analyzing deep linear networks, we found that PC's gradients interpolate between first-order and Gauss-Newton-like gradients.<n>We demonstrate that, in specific standard settings, PC in the infinite-width limit behaves more similarly to the first-order gradient.
arXiv Detail & Related papers (2024-11-04T11:38:27Z) - Tight Stability, Convergence, and Robustness Bounds for Predictive Coding Networks [60.3634789164648]
Energy-based learning algorithms, such as predictive coding (PC), have garnered significant attention in the machine learning community.
We rigorously analyze the stability, robustness, and convergence of PC through the lens of dynamical systems theory.
arXiv Detail & Related papers (2024-10-07T02:57:26Z) - Only Strict Saddles in the Energy Landscape of Predictive Coding Networks? [2.499907423888049]
Predictive coding (PC) is an energy-based learning algorithm that performs iterative inference over network activities before updating weights.
We study the geometry of the PC energy landscape at the inference equilibrium of the network activities.
arXiv Detail & Related papers (2024-08-21T20:23:44Z) - A Theoretical Framework for Inference and Learning in Predictive Coding
Networks [41.58529335439799]
Predictive coding (PC) is an influential theory in computational neuroscience.
We provide a comprehensive theoretical analysis of the properties of PCNs trained with prospective configuration.
arXiv Detail & Related papers (2022-07-21T04:17:55Z) - On the Convergence of Certified Robust Training with Interval Bound
Propagation [147.77638840942447]
We present a theoretical analysis on the convergence of IBP training.
We show that when using IBP training to train a randomly two-layer ReLU neural network with logistic loss, gradient descent can linearly converge to zero robust training error.
arXiv Detail & Related papers (2022-03-16T21:49:13Z) - A Theoretical View of Linear Backpropagation and Its Convergence [55.69505060636719]
Backpropagation (BP) is widely used for calculating gradients in deep neural networks (DNNs)
Recently, a linear variant of BP named LinBP was introduced for generating more transferable adversarial examples for performing black-box attacks.
We provide theoretical analyses on LinBP in neural-network-involved learning tasks, including adversarial attack and model training.
arXiv Detail & Related papers (2021-12-21T07:18:00Z) - Towards Evaluating and Training Verifiably Robust Neural Networks [81.39994285743555]
We study the relationship between IBP and CROWN, and prove that CROWN is always tighter than IBP when choosing appropriate bounding lines.
We propose a relaxed version of CROWN, linear bound propagation (LBP), that can be used to verify large networks to obtain lower verified errors.
arXiv Detail & Related papers (2021-04-01T13:03:48Z) - Predictive Coding Can Do Exact Backpropagation on Convolutional and
Recurrent Neural Networks [40.51949948934705]
Predictive coding networks (PCNs) are an influential model for information processing in the brain.
BP is commonly regarded to be the most successful learning method in modern machine learning.
We show that a biologically plausible algorithm is able to exactly replicate the accuracy of BP on complex architectures.
arXiv Detail & Related papers (2021-03-05T14:57:01Z) - Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing
its Gradient Estimator Bias [62.43908463620527]
In practice, EP does not scale to visual tasks harder than MNIST.
We show that a bias in the gradient estimate of EP, inherent in the use of finite nudging, is responsible for this phenomenon.
These results highlight EP as a scalable approach to compute error gradients in deep neural networks, thereby motivating its hardware implementation.
arXiv Detail & Related papers (2021-01-14T10:23:40Z) - Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing
its Gradient Estimator Bias [65.13042449121411]
In practice, training a network with the gradient estimates provided by EP does not scale to visual tasks harder than MNIST.
We show that a bias in the gradient estimate of EP, inherent in the use of finite nudging, is responsible for this phenomenon.
We apply these techniques to train an architecture with asymmetric forward and backward connections, yielding a 13.2% test error.
arXiv Detail & Related papers (2020-06-06T09:36:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.