Related papers: Backpropagation as Physical Relaxation: Exact Gradients in Finite Time

Backpropagation as Physical Relaxation: Exact Gradients in Finite Time

URL: http://arxiv.org/abs/2602.02281v1
Date: Mon, 02 Feb 2026 16:21:05 GMT
Title: Backpropagation as Physical Relaxation: Exact Gradients in Finite Time
Authors: Antonino Emanuele Scurria,
Abstract summary: ''Dyadic Backpropagation'' is the foundational algorithm for training neural networks.<n>We show it emerges exactly as the finite-time relaxation of a physical dynamical system.<n>We prove that unit-step Euler discretization, the natural timescale of layer transitions, recovers standard backpropagation exactly in precisely 2L steps.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Backpropagation, the foundational algorithm for training neural networks, is typically understood as a symbolic computation that recursively applies the chain rule. We show it emerges exactly as the finite-time relaxation of a physical dynamical system. By formulating feedforward inference as a continuous-time process and applying Lagrangian theory of non-conservative systems to handle asymmetric interactions, we derive a global energy functional on a doubled state space encoding both activations and sensitivities. The saddle-point dynamics of this energy perform inference and credit assignment simultaneously through local interactions. We term this framework ''Dyadic Backpropagation''. Crucially, we prove that unit-step Euler discretization, the natural timescale of layer transitions, recovers standard backpropagation exactly in precisely 2L steps for an L-layer network, with no approximations. Unlike prior energy-based methods requiring symmetric weights, asymptotic convergence, or vanishing perturbations, our framework guarantees exact gradients in finite time. This establishes backpropagation as the digitally optimized shadow of a continuous physical relaxation, providing a rigorous foundation for exact gradient computation in analog and neuromorphic substrates where continuous dynamics are native.

Related papers

Emergent Manifold Separability during Reasoning in Large Language Models [46.78826734548872]
Chain-of-Thought prompting significantly improves reasoning in Large Language Models.<n>We quantify the linear separability of latent representations without the confounding factors of probe training.
arXiv Detail & Related papers (2026-02-23T20:36:17Z)
Continuous-Time Homeostatic Dynamics for Reentrant Inference Models [0.0]
We formulate the Fast-Weights Homeostatic Reentry Network as a continuous-time neural-ODE system.<n>The dynamics admit bounded attractors governed by an energy functional, yielding a ring-like manifold.<n>Unlike continuous-time recurrent neural networks or liquid neural networks, FHRN achieves stability through population-level gain modulation rather than fixed recurrence or neuron-local time adaptation.
arXiv Detail & Related papers (2025-12-04T07:33:13Z)
Temporal Lifting as Latent-Space Regularization for Continuous-Time Flow Models in AI Systems [0.0]
We present a latent-space formulation of adaptive temporal reparametrization for continuous-time dynamical systems.<n>From the standpoint of machine-learning dynamics, temporal lifting acts as a continuous-time normalization or time-warping operator.
arXiv Detail & Related papers (2025-10-10T19:06:32Z)
Why Neural Network Can Discover Symbolic Structures with Gradient-based Training: An Algebraic and Geometric Foundation for Neurosymbolic Reasoning [73.18052192964349]
We develop a theoretical framework that explains how discrete symbolic structures can emerge naturally from continuous neural network training dynamics.<n>By lifting neural parameters to a measure space and modeling training as Wasserstein gradient flow, we show that under geometric constraints, the parameter measure $mu_t$ undergoes two concurrent phenomena.
arXiv Detail & Related papers (2025-06-26T22:40:30Z)
Generative System Dynamics in Recurrent Neural Networks [56.958984970518564]
We investigate the continuous time dynamics of Recurrent Neural Networks (RNNs)<n>We show that skew-symmetric weight matrices are fundamental to enable stable limit cycles in both linear and nonlinear configurations.<n> Numerical simulations showcase how nonlinear activation functions not only maintain limit cycles, but also enhance the numerical stability of the system integration process.
arXiv Detail & Related papers (2025-04-16T10:39:43Z)
Geometrically Taming Dynamical Entanglement Growth in Purified Quantum States [0.0]
Entanglement properties of purified quantum states are of key interest in quantum information theory. We show how geometric methods may be harnessed to reduce such dynamical entanglement growth. We also obtain a general prescription for maintaining (locally) optimal entanglement entropy when time-evolving a purified state.
arXiv Detail & Related papers (2023-09-14T18:00:07Z)
Machine learning in and out of equilibrium [58.88325379746631]
Our study uses a Fokker-Planck approach, adapted from statistical physics, to explore these parallels. We focus in particular on the stationary state of the system in the long-time limit, which in conventional SGD is out of equilibrium. We propose a new variation of Langevin dynamics (SGLD) that harnesses without replacement minibatching.
arXiv Detail & Related papers (2023-06-06T09:12:49Z)
On Convergence of Training Loss Without Reaching Stationary Points [62.41370821014218]
We show that Neural Network weight variables do not converge to stationary points where the gradient the loss function vanishes. We propose a new perspective based on ergodic theory dynamical systems.
arXiv Detail & Related papers (2021-10-12T18:12:23Z)
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion [29.489737359897312]
We study the limiting dynamics of deep neural networks trained with gradient descent (SGD) We show that the key ingredient driving these dynamics is not the original training loss, but rather the combination of a modified loss, which implicitly regularizes the velocity and probability currents, which cause oscillations in phase space.
arXiv Detail & Related papers (2021-07-19T20:18:57Z)
On dissipative symplectic integration with applications to gradient-based optimization [77.34726150561087]
We propose a geometric framework in which discretizations can be realized systematically. We show that a generalization of symplectic to nonconservative and in particular dissipative Hamiltonian systems is able to preserve rates of convergence up to a controlled error.
arXiv Detail & Related papers (2020-04-15T00:36:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.