Stochastic weight matrix dynamics during learning and Dyson Brownian motion
- URL: http://arxiv.org/abs/2407.16427v1
- Date: Tue, 23 Jul 2024 12:25:50 GMT
- Title: Stochastic weight matrix dynamics during learning and Dyson Brownian motion
- Authors: Gert Aarts, Biagio Lucini, Chanju Park,
- Abstract summary: We demonstrate that the update of weight matrices in learning algorithms can be described in the framework of Dyson Brownian motion.
We discuss universal and non-universal features in the gas distribution and identify the Wigner surmise and Wigner semicircle explicitly in a teacher-student model.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We demonstrate that the update of weight matrices in learning algorithms can be described in the framework of Dyson Brownian motion, thereby inheriting many features of random matrix theory. We relate the level of stochasticity to the ratio of the learning rate and the mini-batch size, providing more robust evidence to a previously conjectured scaling relationship. We discuss universal and non-universal features in the resulting Coulomb gas distribution and identify the Wigner surmise and Wigner semicircle explicitly in a teacher-student model and in the (near-)solvable case of the Gaussian restricted Boltzmann machine.
Related papers
- Dyson Brownian motion and random matrix dynamics of weight matrices during learning [0.0]
We first demonstrate that the dynamics can generically be described using Dyson Brownian motion.
The level ofity is shown to depend on the ratio of the learning rate and the mini-batch size.
We then study weight matrix dynamics in transformers following the evolution from a Marchenko-Pastur distribution for eigenvalues at initialisation to a combination with additional structure at the end of learning.
arXiv Detail & Related papers (2024-11-20T18:05:39Z) - Quantum tomography of helicity states for general scattering processes [55.2480439325792]
Quantum tomography has become an indispensable tool in order to compute the density matrix $rho$ of quantum systems in Physics.
We present the theoretical framework for reconstructing the helicity quantum initial state of a general scattering process.
arXiv Detail & Related papers (2023-10-16T21:23:42Z) - Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood
Estimation for Latent Gaussian Models [69.22568644711113]
We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversions.
Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation.
In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
arXiv Detail & Related papers (2023-06-05T21:08:34Z) - Kernel Density Matrices for Probabilistic Deep Learning [8.486487001779416]
In quantum mechanics, a density matrix is the most general way to describe the state of a quantum system.
This paper introduces a novel approach to probabilistic deep learning, kernel density matrices.
It provides a simpler yet effective mechanism for representing joint probability distributions of both continuous and discrete random variables.
arXiv Detail & Related papers (2023-05-26T12:59:58Z) - Discrete Lagrangian Neural Networks with Automatic Symmetry Discovery [3.06483729892265]
We introduce a framework to learn a discrete Lagrangian along with its symmetry group from discrete observations of motions.
The learning process does not restrict the form of the Lagrangian, does not require velocity or momentum observations or predictions and incorporates a cost term.
arXiv Detail & Related papers (2022-11-20T00:46:33Z) - Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints.
The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution.
We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z) - Graph Polynomial Convolution Models for Node Classification of
Non-Homophilous Graphs [52.52570805621925]
We investigate efficient learning from higher-order graph convolution and learning directly from adjacency matrix for node classification.
We show that the resulting model lead to new graphs and residual scaling parameter.
We demonstrate that the proposed methods obtain improved accuracy for node-classification of non-homophilous parameters.
arXiv Detail & Related papers (2022-09-12T04:46:55Z) - Learning Lattice Quantum Field Theories with Equivariant Continuous
Flows [10.124564216461858]
We propose a novel machine learning method for sampling from the high-dimensional probability distributions of Lattice Field Theories.
We test our model on the $phi4$ theory, showing that it systematically outperforms previously proposed flow-based methods in sampling efficiency.
arXiv Detail & Related papers (2022-07-01T09:20:05Z) - Machine Learning S-Wave Scattering Phase Shifts Bypassing the Radial
Schr\"odinger Equation [77.34726150561087]
We present a proof of concept machine learning model resting on a convolutional neural network capable to yield accurate scattering s-wave phase shifts.
We discuss how the Hamiltonian can serve as a guiding principle in the construction of a physically-motivated descriptor.
arXiv Detail & Related papers (2021-06-25T17:25:38Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z) - A Dynamical Mean-Field Theory for Learning in Restricted Boltzmann
Machines [2.8021833233819486]
We define a message-passing algorithm for computing magnetizations in Boltzmann machines.
We prove the global convergence of the algorithm under a stability criterion and compute convergence rates showing excellent agreement with numerical simulations.
arXiv Detail & Related papers (2020-05-04T15:19:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.