Related papers: Renormalizable Spectral-Shell Dynamics as the Origin of Neural Scaling Laws

Renormalizable Spectral-Shell Dynamics as the Origin of Neural Scaling Laws

URL: http://arxiv.org/abs/2512.10427v2
Date: Mon, 15 Dec 2025 06:45:10 GMT
Title: Renormalizable Spectral-Shell Dynamics as the Origin of Neural Scaling Laws
Authors: Yizhou Zhang,
Abstract summary: We show that deep-network training obeys a simple macroscopic structure despite highly nonlinear optimization dynamics.<n>For mean-squared error loss, the training error evolves as $dot e_t=-M(t)e_t$ with $M(t)=J_(t)J_(t)!*$, a time-dependent self-adjoint operator induced by the network Jacobian.<n>This framework explains neural scaling laws and double descent, and unifies lazy (NTK-like) training and feature learning as two limits of the same spectral-shell
Score: 2.779943773196378
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural scaling laws and double-descent phenomena suggest that deep-network training obeys a simple macroscopic structure despite highly nonlinear optimization dynamics. We derive such structure directly from gradient descent in function space. For mean-squared error loss, the training error evolves as $\dot e_t=-M(t)e_t$ with $M(t)=J_{θ(t)}J_{θ(t)}^{\!*}$, a time-dependent self-adjoint operator induced by the network Jacobian. Using Kato perturbation theory, we obtain an exact system of coupled modewise ODEs in the instantaneous eigenbasis of $M(t)$. To extract macroscopic behavior, we introduce a logarithmic spectral-shell coarse-graining and track quadratic error energy across shells. Microscopic interactions within each shell cancel identically at the energy level, so shell energies evolve only through dissipation and external inter-shell interactions. We formalize this via a \emph{renormalizable shell-dynamics} assumption, under which cumulative microscopic effects reduce to a controlled net flux across shell boundaries. Assuming an effective power-law spectral transport in a relevant resolution range, the shell dynamics admits a self-similar solution with a moving resolution frontier and explicit scaling exponents. This framework explains neural scaling laws and double descent, and unifies lazy (NTK-like) training and feature learning as two limits of the same spectral-shell dynamics.

Related papers

When Does Learning Renormalize? Sufficient Conditions for Power Law Spectral Dynamics [2.779943773196378]
Empirical power--law scaling has been widely observed across modern deep learning systems.<n>We show that power--law scaling does not follow from renormalizability alone, but instead arises as a rigidity consequence.
arXiv Detail & Related papers (2025-12-20T04:15:07Z)
Gradient Descent as a Perceptron Algorithm: Understanding Dynamics and Implicit Acceleration [67.12978375116599]
We show that the steps of gradient descent (GD) reduce to those of generalized perceptron algorithms.<n>This helps explain the optimization dynamics and the implicit acceleration phenomenon observed in neural networks.
arXiv Detail & Related papers (2025-12-12T14:16:35Z)
Lazy Diffusion: Mitigating spectral collapse in generative diffusion-based stable autoregressive emulation of turbulent flows [0.0]
We show that standard DDPMs induce a fundamental emphspectral collapse.<n>We introduce power-law schedules that preserve fine-scale structure deeper into diffusion time.<n>These methods are applied to high-Reynolds-number 2D Kolmogorov turbulence and $1/12circ$ Gulf of Mexico ocean reanalysis.
arXiv Detail & Related papers (2025-12-10T12:05:32Z)
Fast-Forward Lattice Boltzmann: Learning Kinetic Behaviour with Physics-Informed Neural Operators [37.65214107289304]
We introduce a physics-informed neural operator framework for the lattice Boltzmann equation (LBE)<n>Our framework is discretization-invariant, enabling models trained on coarse lattices to generalise to finer ones.<n>Results demonstrate robustness across complex flow scenarios, including von Karman vortex shedding, ligament breakup, and bubble adhesion.
arXiv Detail & Related papers (2025-09-26T14:36:23Z)
Enabling Automatic Differentiation with Mollified Graph Neural Operators [73.52999622724101]
We propose the mollified graph neural operator ($m$GNO), the first method to leverage automatic differentiation and compute exact gradients on arbitrary geometries.<n>For a PDE example on regular grids, $m$GNO paired with autograd reduced the L2 relative data error by 20x compared to finite differences.<n>It can also solve PDEs on unstructured point clouds seamlessly, using physics losses only, at resolutions vastly lower than those needed for finite differences to be accurate enough.
arXiv Detail & Related papers (2025-04-11T06:16:30Z)
DimINO: Dimension-Informed Neural Operator Learning [41.37905663176428]
DimINO is a framework inspired by dimensional analysis.<n>It can be seamlessly integrated into existing neural operator architectures.<n>It achieves up to 76.3% performance gain on PDE datasets.
arXiv Detail & Related papers (2024-10-08T10:48:50Z)
Hamiltonian Mechanics of Feature Learning: Bottleneck Structure in Leaky ResNets [58.460298576330835]
We study Leaky ResNets, which interpolate between ResNets and Fully-Connected nets depending on an 'effective depth'<n>We leverage this intuition to explain the emergence of a bottleneck structure, as observed in previous work.
arXiv Detail & Related papers (2024-05-27T18:15:05Z)
Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks. In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z)
Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z)
Quantum Simulation of Lindbladian Dynamics via Repeated Interactions [0.5097809301149342]
We make use of an approximate correspondence between Lindbladian dynamics and evolution based on Repeated Interaction (RI) CPTP maps.<n>We show that the number of interactions needed to simulate the Liouvillian $etmathcalL$ within error $epsilon$ scales in a weak coupling limit.
arXiv Detail & Related papers (2023-12-08T21:17:16Z)
Quantum simulation of dissipation for Maxwell equations in dispersive media [0.0]
dissipation appears in the Schr"odinger representation of classical Maxwell equations as a sparse diagonal operator occupying an $r$-dimensional subspace. The unitary operators can be implemented through qubit lattice algorithm (QLA) on $n$ qubits. The non-unitary-dissipative part poses a challenge on how it should be implemented on a quantum computer.
arXiv Detail & Related papers (2023-07-31T18:22:40Z)
Implicit Bias of MSE Gradient Optimization in Underparameterized Neural Networks [0.0]
We study the dynamics of a neural network in function space when optimizing the mean squared error via gradient flow. We show that the network learns eigenfunctions of an integral operator $T_Kinfty$ determined by the Neural Tangent Kernel (NTK) We conclude that damped deviations offers a simple and unifying perspective of the dynamics when optimizing the squared error.
arXiv Detail & Related papers (2022-01-12T23:28:41Z)
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion [29.489737359897312]
We study the limiting dynamics of deep neural networks trained with gradient descent (SGD) We show that the key ingredient driving these dynamics is not the original training loss, but rather the combination of a modified loss, which implicitly regularizes the velocity and probability currents, which cause oscillations in phase space.
arXiv Detail & Related papers (2021-07-19T20:18:57Z)
Spectral Analysis of Product Formulas for Quantum Simulation [0.0]
We show that the Trotter step size needed to estimate an energy eigenvalue within precision can be improved in scaling from $epsilon$ to $epsilon1/2$ for a large class of systems. Results partially generalize to diabatic processes, which remain in a narrow energy band separated from the rest of the spectrum by a gap.
arXiv Detail & Related papers (2021-02-25T03:17:25Z)
Quantum Algorithms for Simulating the Lattice Schwinger Model [63.18141027763459]
We give scalable, explicit digital quantum algorithms to simulate the lattice Schwinger model in both NISQ and fault-tolerant settings. In lattice units, we find a Schwinger model on $N/2$ physical sites with coupling constant $x-1/2$ and electric field cutoff $x-1/2Lambda$. We estimate observables which we cost in both the NISQ and fault-tolerant settings by assuming a simple target observable---the mean pair density.
arXiv Detail & Related papers (2020-02-25T19:18:36Z)
Anisotropy-mediated reentrant localization [62.997667081978825]
We consider a 2d dipolar system, $d=2$, with the generalized dipole-dipole interaction $sim r-a$, and the power $a$ controlled experimentally in trapped-ion or Rydberg-atom systems. We show that the spatially homogeneous tilt $beta$ of the dipoles giving rise to the anisotropic dipole exchange leads to the non-trivial reentrant localization beyond the locator expansion.
arXiv Detail & Related papers (2020-01-31T19:00:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.