Related papers: Only Strict Saddles in the Energy Landscape of Predictive Coding Networks?

Only Strict Saddles in the Energy Landscape of Predictive Coding Networks?

URL: http://arxiv.org/abs/2408.11979v2
Date: Fri, 08 Nov 2024 16:19:49 GMT
Title: Only Strict Saddles in the Energy Landscape of Predictive Coding Networks?
Authors: Francesco Innocenti, El Mehdi Achour, Ryan Singh, Christopher L. Buckley,
Abstract summary: Predictive coding (PC) is an energy-based learning algorithm that performs iterative inference over network activities before updating weights. We study the geometry of the PC energy landscape at the inference equilibrium of the network activities.
Score: 2.499907423888049
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Predictive coding (PC) is an energy-based learning algorithm that performs iterative inference over network activities before updating weights. Recent work suggests that PC can converge in fewer learning steps than backpropagation thanks to its inference procedure. However, these advantages are not always observed, and the impact of PC inference on learning is not theoretically well understood. Here, we study the geometry of the PC energy landscape at the inference equilibrium of the network activities. For deep linear networks, we first show that the equilibrated energy is simply a rescaled mean squared error loss with a weight-dependent rescaling. We then prove that many highly degenerate (non-strict) saddles of the loss including the origin become much easier to escape (strict) in the equilibrated energy. Our theory is validated by experiments on both linear and non-linear networks. Based on these and other results, we conjecture that all the saddles of the equilibrated energy are strict. Overall, this work suggests that PC inference makes the loss landscape more benign and robust to vanishing gradients, while also highlighting the fundamental challenge of scaling PC to deeper models.

Related papers

On the Infinite Width and Depth Limits of Predictive Coding Networks [8.779034498638826]
Predictive coding (PC) is a biologically plausible alternative to standard backpropagation (BP)<n>Recent work has improved the training stability of deep PC networks.<n>We study the infinite width and depth limits of PCNs.
arXiv Detail & Related papers (2026-02-07T20:47:32Z)
Towards Scaling Deep Neural Networks with Predictive Coding: Theory and Practice [1.2691047660244335]
Backpropagation (BP) is the standard algorithm for training the deep neural networks that power modern artificial intelligence.<n>This thesis studies an alternative, potentially more efficient brain-inspired algorithm called predictive coding (PC)
arXiv Detail & Related papers (2025-10-24T14:47:49Z)
Disentangling the Causes of Plasticity Loss in Neural Networks [55.23250269007988]
We show that loss of plasticity can be decomposed into multiple independent mechanisms. We show that a combination of layer normalization and weight decay is highly effective at maintaining plasticity in a variety of synthetic nonstationary learning tasks.
arXiv Detail & Related papers (2024-02-29T00:02:33Z)
The Law of Parsimony in Gradient Descent for Learning Deep Linear Networks [34.85235641812005]
We reveal a surprising "law of parsimony" in the learning dynamics when the data possesses low-dimensional structures. This simplicity in learning dynamics could have significant implications for both efficient training and a better understanding of deep networks.
arXiv Detail & Related papers (2023-06-01T21:24:53Z)
Feature-Learning Networks Are Consistent Across Widths At Realistic Scales [72.27228085606147]
We study the effect of width on the dynamics of feature-learning neural networks across a variety of architectures and datasets. Early in training, wide neural networks trained on online data have not only identical loss curves but also agree in their point-wise test predictions throughout training. We observe, however, that ensembles of narrower networks perform worse than a single wide network.
arXiv Detail & Related papers (2023-05-28T17:09:32Z)
Random Weights Networks Work as Loss Prior Constraint for Image Restoration [50.80507007507757]
We present our belief Random Weights Networks can be Acted as Loss Prior Constraint for Image Restoration'' Our belief can be directly inserted into existing networks without any training and testing computational cost. To emphasize, our main focus is to spark the realms of loss function and save their current neglected status.
arXiv Detail & Related papers (2023-03-29T03:43:51Z)
Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models. We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers. A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z)
Learning Energy Networks with Generalized Fenchel-Young Losses [34.46284877812228]
Energy-based models, a.k.a. energy networks, perform inference by optimizing an energy function. We propose generalized Fenchel-Young losses, a natural loss construction for learning energy networks.
arXiv Detail & Related papers (2022-05-19T14:32:04Z)
On Convergence of Training Loss Without Reaching Stationary Points [62.41370821014218]
We show that Neural Network weight variables do not converge to stationary points where the gradient the loss function vanishes. We propose a new perspective based on ergodic theory dynamical systems.
arXiv Detail & Related papers (2021-10-12T18:12:23Z)
Towards Understanding Learning in Neural Networks with Linear Teachers [31.849269592822296]
We prove that SGD globally optimize this learning problem for a two-layer network with Leaky ReLU activations. We provide theoretical support for this phenomenon by proving that if network weights converge to two weight clusters, this will imply an approximately linear decision boundary.
arXiv Detail & Related papers (2021-01-07T13:21:24Z)
The Golden Ratio of Learning and Momentum [0.5076419064097732]
This paper proposes a new information-theoretical loss function motivated by neural signal processing in a synapse. All results taken together show that loss, learning rate, and momentum are closely connected.
arXiv Detail & Related papers (2020-06-08T17:08:13Z)
Distance-Based Regularisation of Deep Networks for Fine-Tuning [116.71288796019809]
We develop an algorithm that constrains a hypothesis class to a small sphere centred on the initial pre-trained weights. Empirical evaluation shows that our algorithm works well, corroborating our theoretical results.
arXiv Detail & Related papers (2020-02-19T16:00:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.