Random Matrix Theory for Stochastic Gradient Descent
- URL: http://arxiv.org/abs/2412.20496v1
- Date: Sun, 29 Dec 2024 15:21:13 GMT
- Title: Random Matrix Theory for Stochastic Gradient Descent
- Authors: Chanju Park, Matteo Favoni, Biagio Lucini, Gert Aarts,
- Abstract summary: Investigating the dynamics of learning in machine learning algorithms is paramount importance for understanding how and why an approach may be successful.
Here we apply concepts from random matrix theory to describe weight matrix dynamics, using the framework of Dyson Brownian motion.
We derive the linear scaling rule between the learning rate (step size) and the batch size, and identify universal and non-universal aspects of weight matrix dynamics.
- Score: 0.0
- License:
- Abstract: Investigating the dynamics of learning in machine learning algorithms is of paramount importance for understanding how and why an approach may be successful. The tools of physics and statistics provide a robust setting for such investigations. Here we apply concepts from random matrix theory to describe stochastic weight matrix dynamics, using the framework of Dyson Brownian motion. We derive the linear scaling rule between the learning rate (step size) and the batch size, and identify universal and non-universal aspects of weight matrix dynamics. We test our findings in the (near-)solvable case of the Gaussian Restricted Boltzmann Machine and in a linear one-hidden-layer neural network.
Related papers
- Dyson Brownian motion and random matrix dynamics of weight matrices during learning [0.0]
We first demonstrate that the dynamics can generically be described using Dyson Brownian motion.
The level ofity is shown to depend on the ratio of the learning rate and the mini-batch size.
We then study weight matrix dynamics in transformers following the evolution from a Marchenko-Pastur distribution for eigenvalues at initialisation to a combination with additional structure at the end of learning.
arXiv Detail & Related papers (2024-11-20T18:05:39Z) - Stochastic weight matrix dynamics during learning and Dyson Brownian motion [0.0]
We demonstrate that the update of weight matrices in learning algorithms can be described in the framework of Dyson Brownian motion.
We discuss universal and non-universal features in the gas distribution and identify the Wigner surmise and Wigner semicircle explicitly in a teacher-student model.
arXiv Detail & Related papers (2024-07-23T12:25:50Z) - Recent and Upcoming Developments in Randomized Numerical Linear Algebra for Machine Learning [49.0767291348921]
Randomized Numerical Linear Algebra (RandNLA) is an area which uses randomness to develop improved algorithms for ubiquitous matrix problems.
This article provides a self-contained overview of RandNLA, in light of these developments.
arXiv Detail & Related papers (2024-06-17T02:30:55Z) - Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood
Estimation for Latent Gaussian Models [69.22568644711113]
We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversions.
Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation.
In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
arXiv Detail & Related papers (2023-06-05T21:08:34Z) - Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data.
In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z) - Test Set Sizing Via Random Matrix Theory [91.3755431537592]
This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression.
It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise.
This paper is the first to solve for the training and test size for any model in a way that is truly optimal.
arXiv Detail & Related papers (2021-12-11T13:18:33Z) - Learning a Compressive Sensing Matrix with Structural Constraints via
Maximum Mean Discrepancy Optimization [17.104994036477308]
We introduce a learning-based algorithm to obtain a measurement matrix for compressive sensing related recovery problems.
Recent success of such metrics in neural network related topics motivate a solution of the problem based on machine learning.
arXiv Detail & Related papers (2021-10-14T08:35:54Z) - Shallow Representation is Deep: Learning Uncertainty-aware and
Worst-case Random Feature Dynamics [1.1470070927586016]
This paper views uncertain system models as unknown or uncertain smooth functions in universal kernel Hilbert spaces.
By directly approximating the one-step dynamics function using random features with uncertain parameters, we then view the whole dynamical system as a multi-layer neural network.
arXiv Detail & Related papers (2021-06-24T14:48:12Z) - Efficient construction of tensor-network representations of many-body
Gaussian states [59.94347858883343]
We present a procedure to construct tensor-network representations of many-body Gaussian states efficiently and with a controllable error.
These states include the ground and thermal states of bosonic and fermionic quadratic Hamiltonians, which are essential in the study of quantum many-body systems.
arXiv Detail & Related papers (2020-08-12T11:30:23Z) - Stochastic Flows and Geometric Optimization on the Orthogonal Group [52.50121190744979]
We present a new class of geometrically-driven optimization algorithms on the orthogonal group $O(d)$.
We show that our methods can be applied in various fields of machine learning including deep, convolutional and recurrent neural networks, reinforcement learning, flows and metric learning.
arXiv Detail & Related papers (2020-03-30T15:37:50Z) - Understanding the dynamics of message passing algorithms: a free
probability heuristics [2.8021833233819486]
We analyze the behavior of inference algorithms for probabilistic models with dense coupling matrices in the limit of large systems.
For a toy Ising model, we are able to recover previous results such as the property of vanishing effective memories and the analytical convergence rate of the algorithm.
arXiv Detail & Related papers (2020-02-03T19:50:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.