Related papers: When Does Learning Renormalize? Sufficient Conditions for Power Law Spectral Dynamics

When Does Learning Renormalize? Sufficient Conditions for Power Law Spectral Dynamics

URL: http://arxiv.org/abs/2512.18209v2
Date: Thu, 25 Dec 2025 19:43:48 GMT
Title: When Does Learning Renormalize? Sufficient Conditions for Power Law Spectral Dynamics
Authors: Yizhou Zhang,
Abstract summary: Empirical power--law scaling has been widely observed across modern deep learning systems.<n>We show that power--law scaling does not follow from renormalizability alone, but instead arises as a rigidity consequence.
Score: 2.779943773196378
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Empirical power--law scaling has been widely observed across modern deep learning systems, yet its theoretical origins and scope of validity remain incompletely understood. The Generalized Resolution--Shell Dynamics (GRSD) framework models learning as spectral energy transport across logarithmic resolution shells, providing a coarse--grained dynamical description of training. Within GRSD, power--law scaling corresponds to a particularly simple renormalized shell dynamics; however, such behavior is not automatic and requires additional structural properties of the learning process. In this work, we identify a set of sufficient conditions under which the GRSD shell dynamics admits a renormalizable coarse--grained description. These conditions constrain the learning configuration at multiple levels, including boundedness of gradient propagation in the computation graph, weak functional incoherence at initialization, controlled Jacobian evolution along training, and log--shift invariance of renormalized shell couplings. We further show that power--law scaling does not follow from renormalizability alone, but instead arises as a rigidity consequence: once log--shift invariance is combined with the intrinsic time--rescaling covariance of gradient flow, the renormalized GRSD velocity field is forced into a power--law form.

Related papers

Random-Matrix-Induced Simplicity Bias in Over-parameterized Variational Quantum Circuits [72.0643009153473]
We show that expressive variational ansatze enter a Haar-like universality class in which both observable expectation values and parameter gradients concentrate exponentially with system size.<n>As a consequence, the hypothesis class induced by such circuits collapses with high probability to a narrow family of near-constant functions.<n>We further show that this collapse is not unavoidable: tensor-structured VQCs, including tensor-network-based and tensor-hypernetwork parameterizations, lie outside the Haar-like universality class.
arXiv Detail & Related papers (2026-01-05T08:04:33Z)
Constraint Breeds Generalization: Temporal Dynamics as an Inductive Bias [1.219017431258669]
We show that constraints shape dynamics to function not as limitations, but as a temporal inductive bias that breeds generalization.<n>We show that robust AI development requires not only scaling and removing limitations, but computationally mastering the temporal characteristics that naturally promote generalization.
arXiv Detail & Related papers (2025-12-30T00:34:24Z)
Unifying Learning Dynamics and Generalization in Transformers Scaling Law [1.5229257192293202]
The scaling law, a cornerstone of Large Language Model (LLM) development, predicts improvements in model performance with increasing computational resources.<n>This work formalizes the learning dynamics of transformer-based language models as an ordinary differential equation (ODE) system.<n>Our analysis characterizes the convergence of generalization error to the irreducible risk as computational resources scale with data.
arXiv Detail & Related papers (2025-12-26T17:20:09Z)
Renormalizable Spectral-Shell Dynamics as the Origin of Neural Scaling Laws [2.779943773196378]
We show that deep-network training obeys a simple macroscopic structure despite highly nonlinear optimization dynamics.<n>For mean-squared error loss, the training error evolves as $dot e_t=-M(t)e_t$ with $M(t)=J_(t)J_(t)!*$, a time-dependent self-adjoint operator induced by the network Jacobian.<n>This framework explains neural scaling laws and double descent, and unifies lazy (NTK-like) training and feature learning as two limits of the same spectral-shell
arXiv Detail & Related papers (2025-12-11T08:38:46Z)
Fast Escape, Slow Convergence: Learning Dynamics of Phase Retrieval under Power-Law Data [15.766916122461923]
Scaling laws describe how learning performance improves with data, compute, or training time, and have become a central theme in modern deep learning.<n>We study this phenomenon in a canonical nonlinear model: phase retrieval with anisotropic Gaussian inputs whose covariance spectrum follows a power law.<n>Unlike the isotropic case, where dynamics collapse to a two-dimensional system, anisotropy yields a qualitatively new regime in which an infinite hierarchy of equations governs the evolution of the summary statistics.
arXiv Detail & Related papers (2025-11-24T00:21:17Z)
Identifiable learning of dissipative dynamics [25.409059056398124]
We introduce I-OnsagerNet, a neural framework that learns dissipative dynamics directly from trajectories.<n>I-OnsagerNet extends the Onsager principle to guarantee that the learned potential is obtained from the stationary density.<n>Our approach enables us to calculate the entropy production and to quantify irreversibility, offering a principled way to detect and quantify deviations from equilibrium.
arXiv Detail & Related papers (2025-10-28T07:57:14Z)
NeuralGrok: Accelerate Grokking by Neural Gradient Transformation [54.65707216563953]
We propose NeuralGrok, a gradient-based approach that learns an optimal gradient transformation to accelerate generalization of transformers in arithmetic tasks.<n>Our experiments demonstrate that NeuralGrok significantly accelerates generalization, particularly in challenging arithmetic tasks.<n>We also show that NeuralGrok promotes a more stable training paradigm, constantly reducing the model's complexity.
arXiv Detail & Related papers (2025-04-24T04:41:35Z)
On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics. The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z)
SEGNO: Generalizing Equivariant Graph Neural Networks with Physical Inductive Biases [66.61789780666727]
We show how the second-order continuity can be incorporated into GNNs while maintaining the equivariant property. We also offer theoretical insights into SEGNO, highlighting that it can learn a unique trajectory between adjacent states. Our model yields a significant improvement over the state-of-the-art baselines.
arXiv Detail & Related papers (2023-08-25T07:15:58Z)
Convergence of mean-field Langevin dynamics: Time and space discretization, stochastic gradient, and variance reduction [49.66486092259376]
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift. Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures. We provide a framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and gradient approximation.
arXiv Detail & Related papers (2023-06-12T16:28:11Z)
Machine learning in and out of equilibrium [58.88325379746631]
Our study uses a Fokker-Planck approach, adapted from statistical physics, to explore these parallels. We focus in particular on the stationary state of the system in the long-time limit, which in conventional SGD is out of equilibrium. We propose a new variation of Langevin dynamics (SGLD) that harnesses without replacement minibatching.
arXiv Detail & Related papers (2023-06-06T09:12:49Z)
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion [29.489737359897312]
We study the limiting dynamics of deep neural networks trained with gradient descent (SGD) We show that the key ingredient driving these dynamics is not the original training loss, but rather the combination of a modified loss, which implicitly regularizes the velocity and probability currents, which cause oscillations in phase space.
arXiv Detail & Related papers (2021-07-19T20:18:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.