Related papers: Improving equilibrium propagation without weight symmetry through Jacobian homeostasis

Improving equilibrium propagation without weight symmetry through Jacobian homeostasis

URL: http://arxiv.org/abs/2309.02214v2
Date: Mon, 8 Apr 2024 07:55:43 GMT
Title: Improving equilibrium propagation without weight symmetry through Jacobian homeostasis
Authors: Axel Laborieux, Friedemann Zenke,
Abstract summary: Equilibrium propagation (EP) is a compelling alternative to the backpropagation of error algorithm (BP) EP requires weight symmetry and infinitesimal equilibrium perturbations, i.e., nudges, to estimate unbiased gradients efficiently. We show that the finite nudge does not pose a problem, as exact derivatives can still be estimated via a Cauchy integral. We present a new homeostatic objective that directly mitigates functional asymmetries of the Jacobian at the network's fixed point.
Score: 7.573586022424398
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Equilibrium propagation (EP) is a compelling alternative to the backpropagation of error algorithm (BP) for computing gradients of neural networks on biological or analog neuromorphic substrates. Still, the algorithm requires weight symmetry and infinitesimal equilibrium perturbations, i.e., nudges, to estimate unbiased gradients efficiently. Both requirements are challenging to implement in physical systems. Yet, whether and how weight asymmetry affects its applicability is unknown because, in practice, it may be masked by biases introduced through the finite nudge. To address this question, we study generalized EP, which can be formulated without weight symmetry, and analytically isolate the two sources of bias. For complex-differentiable non-symmetric networks, we show that the finite nudge does not pose a problem, as exact derivatives can still be estimated via a Cauchy integral. In contrast, weight asymmetry introduces bias resulting in low task performance due to poor alignment of EP's neuronal error vectors compared to BP. To mitigate this issue, we present a new homeostatic objective that directly penalizes functional asymmetries of the Jacobian at the network's fixed point. This homeostatic objective dramatically improves the network's ability to solve complex tasks such as ImageNet 32x32. Our results lay the theoretical groundwork for studying and mitigating the adverse effects of imperfections of physical networks on learning algorithms that rely on the substrate's relaxation dynamics.

Related papers

Weight transport through spike timing for robust local gradients [0.5236468296934584]
plasticity in functional neural networks is frequently expressed as gradient descent on a cost. This imposes symmetry constraints that are difficult to reconcile with local computation. We introduce spike-based alignment learning, which uses spike timing statistics to extract and correct the asymmetry between effective reciprocal connections.
arXiv Detail & Related papers (2025-03-04T14:05:39Z)
Learning Broken Symmetries with Approximate Invariance [1.0485739694839669]
In many cases, the exact underlying symmetry is present only in an idealized dataset, and is broken in actual data. Standard approaches, such as data augmentation or equivariant networks fail to represent the nature of the full, broken symmetry. We propose a learning model which balances the generality and performance of unconstrained networks with the rapid learning of constrained networks.
arXiv Detail & Related papers (2024-12-25T04:29:04Z)
Symmetry Adapted Residual Neural Network Diabatization: Conical Intersections in Aniline Photodissociation [1.2365038403958204]
We present a symmetry adapted neural network (SAResNet) diabatization method to construct quasi-diabatic Hamiltonians. Our SAResNet is applied to construct the full 36-dimensional coupled diabatic potential energy surfaces for aniline N-H bond photodissociation.
arXiv Detail & Related papers (2024-11-03T21:56:25Z)
Deep Learning without Weight Symmetry [1.3812010983144802]
Backpropagation (BP) is a foundational algorithm for training artificial neural networks. BP is often considered biologically implausible. Here we introduce the Product Feedback Alignment (PFA) algorithm.
arXiv Detail & Related papers (2024-05-31T03:11:19Z)
Lie Point Symmetry and Physics Informed Networks [59.56218517113066]
We propose a loss function that informs the network about Lie point symmetries in the same way that PINN models try to enforce the underlying PDE through a loss function. Our symmetry loss ensures that the infinitesimal generators of the Lie group conserve the PDE solutions. Empirical evaluations indicate that the inductive bias introduced by the Lie point symmetries of the PDEs greatly boosts the sample efficiency of PINNs.
arXiv Detail & Related papers (2023-11-07T19:07:16Z)
Learning Layer-wise Equivariances Automatically using Gradients [66.81218780702125]
Convolutions encode equivariance symmetries into neural networks leading to better generalisation performance. symmetries provide fixed hard constraints on the functions a network can represent, need to be specified in advance, and can not be adapted. Our goal is to allow flexible symmetry constraints that can automatically be learned from data using gradients.
arXiv Detail & Related papers (2023-10-09T20:22:43Z)
Law of Balance and Stationary Distribution of Stochastic Gradient Descent [11.937085301750288]
We prove that the minibatch noise of gradient descent (SGD) regularizes the solution towards a balanced solution whenever the loss function contains a rescaling symmetry. We then derive the stationary distribution of gradient flow for a diagonal linear network with arbitrary depth and width. These phenomena are shown to exist uniquely in deep networks, implying a fundamental difference between deep and shallow models.
arXiv Detail & Related papers (2023-08-13T03:13:03Z)
Machine learning in and out of equilibrium [58.88325379746631]
Our study uses a Fokker-Planck approach, adapted from statistical physics, to explore these parallels. We focus in particular on the stationary state of the system in the long-time limit, which in conventional SGD is out of equilibrium. We propose a new variation of Langevin dynamics (SGLD) that harnesses without replacement minibatching.
arXiv Detail & Related papers (2023-06-06T09:12:49Z)
Approximately Equivariant Networks for Imperfectly Symmetric Dynamics [24.363954435050264]
We find that our models can outperform both baselines with no symmetry bias and baselines with overly strict symmetry in both simulated turbulence domains and real-world multi-stream jet flow.
arXiv Detail & Related papers (2022-01-28T07:31:28Z)
Complexity from Adaptive-Symmetries Breaking: Global Minima in the Statistical Mechanics of Deep Neural Networks [0.0]
An antithetical concept, adaptive symmetry, to conservative symmetry in physics is proposed to understand the deep neural networks (DNNs) We characterize the optimization process of a DNN system as an extended adaptive-symmetry-breaking process. More specifically, this process is characterized by a statistical-mechanical model that could be appreciated as a generalization of statistics physics.
arXiv Detail & Related papers (2022-01-03T09:06:44Z)
On Convergence of Training Loss Without Reaching Stationary Points [62.41370821014218]
We show that Neural Network weight variables do not converge to stationary points where the gradient the loss function vanishes. We propose a new perspective based on ergodic theory dynamical systems.
arXiv Detail & Related papers (2021-10-12T18:12:23Z)
Optimizing Mode Connectivity via Neuron Alignment [84.26606622400423]
Empirically, the local minima of loss functions can be connected by a learned curve in model space along which the loss remains nearly constant. We propose a more general framework to investigate effect of symmetry on landscape connectivity by accounting for the weight permutations of networks being connected.
arXiv Detail & Related papers (2020-09-05T02:25:23Z)
Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing its Gradient Estimator Bias [65.13042449121411]
In practice, training a network with the gradient estimates provided by EP does not scale to visual tasks harder than MNIST. We show that a bias in the gradient estimate of EP, inherent in the use of finite nudging, is responsible for this phenomenon. We apply these techniques to train an architecture with asymmetric forward and backward connections, yielding a 13.2% test error.
arXiv Detail & Related papers (2020-06-06T09:36:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.