Fokker-Planck to Callan-Symanzik: evolution of weight matrices under training
- URL: http://arxiv.org/abs/2501.09659v1
- Date: Thu, 16 Jan 2025 16:54:40 GMT
- Title: Fokker-Planck to Callan-Symanzik: evolution of weight matrices under training
- Authors: Wei Bu, Uri Kol, Ziming Liu,
- Abstract summary: We utilize Fokker-Planck to simulate the probability density evolution of individual weight matrices in the bottleneck layers of a simple 2-bottleneck-layered auto-encoder.
We also derive physically relevant partial differential equations such as Callan-Symanzik and Kardar-Parisi-Zhang equations from the dynamical equation we have.
- Score: 9.257985820123
- License:
- Abstract: The dynamical evolution of a neural network during training has been an incredibly fascinating subject of study. First principal derivation of generic evolution of variables in statistical physics systems has proved useful when used to describe training dynamics conceptually, which in practice means numerically solving equations such as Fokker-Planck equation. Simulating entire networks inevitably runs into the curse of dimensionality. In this paper, we utilize Fokker-Planck to simulate the probability density evolution of individual weight matrices in the bottleneck layers of a simple 2-bottleneck-layered auto-encoder and compare the theoretical evolutions against the empirical ones by examining the output data distributions. We also derive physically relevant partial differential equations such as Callan-Symanzik and Kardar-Parisi-Zhang equations from the dynamical equation we have.
Related papers
- Dyson Brownian motion and random matrix dynamics of weight matrices during learning [0.0]
We first demonstrate that the dynamics can generically be described using Dyson Brownian motion.
The level ofity is shown to depend on the ratio of the learning rate and the mini-batch size.
We then study weight matrix dynamics in transformers following the evolution from a Marchenko-Pastur distribution for eigenvalues at initialisation to a combination with additional structure at the end of learning.
arXiv Detail & Related papers (2024-11-20T18:05:39Z) - Learning Controlled Stochastic Differential Equations [61.82896036131116]
This work proposes a novel method for estimating both drift and diffusion coefficients of continuous, multidimensional, nonlinear controlled differential equations with non-uniform diffusion.
We provide strong theoretical guarantees, including finite-sample bounds for (L2), (Linfty), and risk metrics, with learning rates adaptive to coefficients' regularity.
Our method is available as an open-source Python library.
arXiv Detail & Related papers (2024-11-04T11:09:58Z) - Neural stochastic Volterra equations: learning path-dependent dynamics [0.0]
Volterra equations (SVEs) serve as mathematical models for the time evolutions of random systems with memory effects and irregular behaviour.
We introduce neural Volterra equations as a physics-inspired architecture, generalizing the class of neural differential equations, and provide some theoretical foundation.
Numerical experiments on various SVEs, like the disturbed pendulum equation, the generalized Ornstein-Uhlenbeck process and the rough Heston model are presented.
arXiv Detail & Related papers (2024-07-28T18:44:49Z) - HyperSINDy: Deep Generative Modeling of Nonlinear Stochastic Governing
Equations [5.279268784803583]
We introduce HyperSINDy, a framework for modeling dynamics via a deep generative model of sparse governing equations from data.
Once trained, HyperSINDy generates dynamics via a differential equation whose coefficients are driven by a white noise.
In experiments, HyperSINDy recovers ground truth governing equations, with learnedity scaling to match that of the data.
arXiv Detail & Related papers (2023-10-07T14:41:59Z) - Machine learning in and out of equilibrium [58.88325379746631]
Our study uses a Fokker-Planck approach, adapted from statistical physics, to explore these parallels.
We focus in particular on the stationary state of the system in the long-time limit, which in conventional SGD is out of equilibrium.
We propose a new variation of Langevin dynamics (SGLD) that harnesses without replacement minibatching.
arXiv Detail & Related papers (2023-06-06T09:12:49Z) - Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data.
In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z) - Capturing Actionable Dynamics with Structured Latent Ordinary
Differential Equations [68.62843292346813]
We propose a structured latent ODE model that captures system input variations within its latent representation.
Building on a static variable specification, our model learns factors of variation for each input to the system, thus separating the effects of the system inputs in the latent space.
arXiv Detail & Related papers (2022-02-25T20:00:56Z) - Equivariant vector field network for many-body system modeling [65.22203086172019]
Equivariant Vector Field Network (EVFN) is built on a novel equivariant basis and the associated scalarization and vectorization layers.
We evaluate our method on predicting trajectories of simulated Newton mechanics systems with both full and partially observed data.
arXiv Detail & Related papers (2021-10-26T14:26:25Z) - Inference of Stochastic Dynamical Systems from Cross-Sectional
Population Data [8.905677748354364]
Inferring the driving equations of a dynamical system from population or time-course data is important in several scientific fields such as biochemistry, epidemiology, financial mathematics and many others.
In this work, we deduce and then computationally estimate the Fokker-Planck equation which describes the evolution of the population's probability density, based on differential equations.
Then, following the USDL approach, we project the Fokker-Planck equation to a proper set of test functions, transforming it into a linear system of equations.
arXiv Detail & Related papers (2020-12-09T14:02:29Z) - Learning Stochastic Behaviour from Aggregate Data [52.012857267317784]
Learning nonlinear dynamics from aggregate data is a challenging problem because the full trajectory of each individual is not available.
We propose a novel method using the weak form of Fokker Planck Equation (FPE) to describe the density evolution of data in a sampled form.
In such a sample-based framework we are able to learn the nonlinear dynamics from aggregate data without explicitly solving the partial differential equation (PDE) FPE.
arXiv Detail & Related papers (2020-02-10T03:20:13Z) - Physics Informed Deep Learning for Transport in Porous Media. Buckley
Leverett Problem [0.0]
We present a new hybrid physics-based machine-learning approach to reservoir modeling.
The methodology relies on a series of deep adversarial neural network architecture with physics-based regularization.
The proposed methodology is a simple and elegant way to instill physical knowledge to machine-learning algorithms.
arXiv Detail & Related papers (2020-01-15T08:20:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.