From SGD to Spectra: A Theory of Neural Network Weight Dynamics
- URL: http://arxiv.org/abs/2507.12709v1
- Date: Thu, 17 Jul 2025 01:06:39 GMT
- Title: From SGD to Spectra: A Theory of Neural Network Weight Dynamics
- Authors: Brian Richard Olsen, Sam Fatehmanesh, Frank Xiao, Adarsh Kumarappan, Anirudh Gajula,
- Abstract summary: Deep neural networks have revolutionized machine learning, yet their training dynamics remain theoretically unclear.<n>We develop a continuous-time, matrix-valued differential equation (SDE) framework that rigorously connects microscopic dynamics of SGD to macroscopic evolution of singular-value spectra in weight spectra.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks have revolutionized machine learning, yet their training dynamics remain theoretically unclear-we develop a continuous-time, matrix-valued stochastic differential equation (SDE) framework that rigorously connects the microscopic dynamics of SGD to the macroscopic evolution of singular-value spectra in weight matrices. We derive exact SDEs showing that squared singular values follow Dyson Brownian motion with eigenvalue repulsion, and characterize stationary distributions as gamma-type densities with power-law tails, providing the first theoretical explanation for the empirically observed 'bulk+tail' spectral structure in trained networks. Through controlled experiments on transformer and MLP architectures, we validate our theoretical predictions and demonstrate quantitative agreement between SDE-based forecasts and observed spectral evolution, providing a rigorous foundation for understanding why deep learning works.
Related papers
- Neural Tangent Kernel Analysis to Probe Convergence in Physics-informed Neural Solvers: PIKANs vs. PINNs [0.0]
We aim to advance the theoretical understanding of cPIKANs by analyzing them using Neural Tangent Kernel (NTK) theory.<n>We first derive the NTK of standard cKANs in a supervised setting, and then extend the analysis to the physics-informed context.<n>Results indicate a tractable behavior for NTK in the context of cPIKANs, which exposes learning dynamics that standard physics-informed neural networks (PINNs) cannot capture.
arXiv Detail & Related papers (2025-06-09T17:30:13Z) - Models of Heavy-Tailed Mechanistic Universality [62.107333654304014]
We propose a family of random matrix models to explore attributes that give rise to heavy-tailed behavior in trained neural networks.<n>Under this model, spectral densities with power laws on tails arise through a combination of three independent factors.<n> Implications of our model on other appearances of heavy tails, including neural scaling laws, trajectories, and the five-plus-one phases of neural network training, are discussed.
arXiv Detail & Related papers (2025-06-04T00:55:01Z) - Machine Learning-Enhanced Characterisation of Structured Spectral Densities: Leveraging the Reaction Coordinate Mapping [41.94295877935867]
Spectral densities encode essential information about system-environment interactions in open-quantum systems.<n>We leverage machine learning techniques to reconstruct key environmental features using the reaction coordinate mapping.<n>For a dissipative spin-boson model with a structured spectral density expressed as a sum of Lorentzian peaks, we demonstrate that the time evolution of a system observable can be used by a neural network to classify the spectral density as comprising one, two, or three Lorentzian peaks.
arXiv Detail & Related papers (2025-01-13T17:02:04Z) - Dyson Brownian motion and random matrix dynamics of weight matrices during learning [0.0]
We first demonstrate that the dynamics can generically be described using Dyson Brownian motion.
The level ofity is shown to depend on the ratio of the learning rate and the mini-batch size.
We then study weight matrix dynamics in transformers following the evolution from a Marchenko-Pastur distribution for eigenvalues at initialisation to a combination with additional structure at the end of learning.
arXiv Detail & Related papers (2024-11-20T18:05:39Z) - Approaching Deep Learning through the Spectral Dynamics of Weights [41.948042468042374]
spectral dynamics of weights -- the behavior of singular values and vectors during optimization -- to clarify and unify several phenomena in deep learning.
We identify a consistent bias in optimization across various experiments, from small-scale grokking'' to large-scale tasks like image classification with ConvNets, image generation with UNets, speech recognition with LSTMs, and language modeling with Transformers.
arXiv Detail & Related papers (2024-08-21T17:48:01Z) - Gaussian Universality in Neural Network Dynamics with Generalized Structured Input Distributions [2.3020018305241337]
We analyze the behavior of a deep learning system trained on inputs modeled as Gaussian mixtures to better simulate more general structured inputs.<n>Under certain standardization schemes, the deep learning model converges toward Gaussian setting behavior, even when the input data follow more complex or real-world distributions.
arXiv Detail & Related papers (2024-05-01T17:10:55Z) - Fourier Neural Differential Equations for learning Quantum Field
Theories [57.11316818360655]
A Quantum Field Theory is defined by its interaction Hamiltonian, and linked to experimental data by the scattering matrix.
In this paper, NDE models are used to learn theory, Scalar-Yukawa theory and Scalar Quantum Electrodynamics.
The interaction Hamiltonian of a theory can be extracted from network parameters.
arXiv Detail & Related papers (2023-11-28T22:11:15Z) - Towards Predicting Equilibrium Distributions for Molecular Systems with
Deep Learning [60.02391969049972]
We introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems.
DiG employs deep neural networks to transform a simple distribution towards the equilibrium distribution, conditioned on a descriptor of a molecular system.
arXiv Detail & Related papers (2023-06-08T17:12:08Z) - Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data.
In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z) - Momentum Diminishes the Effect of Spectral Bias in Physics-Informed
Neural Networks [72.09574528342732]
Physics-informed neural network (PINN) algorithms have shown promising results in solving a wide range of problems involving partial differential equations (PDEs)
They often fail to converge to desirable solutions when the target function contains high-frequency features, due to a phenomenon known as spectral bias.
In the present work, we exploit neural tangent kernels (NTKs) to investigate the training dynamics of PINNs evolving under gradient descent with momentum (SGDM)
arXiv Detail & Related papers (2022-06-29T19:03:10Z) - Spherical Motion Dynamics: Learning Dynamics of Neural Network with
Normalization, Weight Decay, and SGD [105.99301967452334]
We show the learning dynamics of neural network with normalization, weight decay (WD), and SGD (with momentum) named as Spherical Motion Dynamics (SMD)
We verify our assumptions and theoretical results on various computer vision tasks including ImageNet and MSCOCO with standard settings.
arXiv Detail & Related papers (2020-06-15T14:16:33Z) - Beyond Random Matrix Theory for Deep Networks [0.7614628596146599]
We investigate whether Wigner semi-circle and Marcenko-Pastur distributions, often used for deep neural network theoretical analysis, match empirically observed spectral densities.
We find that even allowing for outliers, the observed spectral shapes strongly deviate from such theoretical predictions.
We consider two new classes of matrix ensembles; random Wigner/Wishart ensemble products and percolated Wigner/Wishart ensembles, both of which better match observed spectra.
arXiv Detail & Related papers (2020-06-13T21:00:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.