Fluctuation-dissipation Type Theorem in Stochastic Linear Learning
- URL: http://arxiv.org/abs/2106.02220v1
- Date: Fri, 4 Jun 2021 02:54:26 GMT
- Title: Fluctuation-dissipation Type Theorem in Stochastic Linear Learning
- Authors: Manhyung Han, Jeonghyeok Park, Taewoong Lee, Jung Hoon Han
- Abstract summary: The fluctuation-dissipation theorem (FDT) is a simple yet powerful consequence of the first-order differential equation governing the dynamics of systems subject simultaneously to dissipative and forces.
The linear learning dynamics, in which the input vector maps to the output vector by a linear matrix whose elements are the subject of learning, has a verify version closely mimicking the Langevin dynamics when a full-batch gradient descent scheme is replaced by that of gradient descent.
We derive a generalized verify for the linear learning dynamics and its validity among the well-known machine learning data sets such as MNIST, CIFAR-10 and
- Score: 2.8292841621378844
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The fluctuation-dissipation theorem (FDT) is a simple yet powerful
consequence of the first-order differential equation governing the dynamics of
systems subject simultaneously to dissipative and stochastic forces. The linear
learning dynamics, in which the input vector maps to the output vector by a
linear matrix whose elements are the subject of learning, has a stochastic
version closely mimicking the Langevin dynamics when a full-batch gradient
descent scheme is replaced by that of stochastic gradient descent. We derive a
generalized FDT for the stochastic linear learning dynamics and verify its
validity among the well-known machine learning data sets such as MNIST,
CIFAR-10 and EMNIST.
Related papers
- Dyson Brownian motion and random matrix dynamics of weight matrices during learning [0.0]
We first demonstrate that the dynamics can generically be described using Dyson Brownian motion.
The level ofity is shown to depend on the ratio of the learning rate and the mini-batch size.
We then study weight matrix dynamics in transformers following the evolution from a Marchenko-Pastur distribution for eigenvalues at initialisation to a combination with additional structure at the end of learning.
arXiv Detail & Related papers (2024-11-20T18:05:39Z) - Learning Controlled Stochastic Differential Equations [61.82896036131116]
This work proposes a novel method for estimating both drift and diffusion coefficients of continuous, multidimensional, nonlinear controlled differential equations with non-uniform diffusion.
We provide strong theoretical guarantees, including finite-sample bounds for (L2), (Linfty), and risk metrics, with learning rates adaptive to coefficients' regularity.
Our method is available as an open-source Python library.
arXiv Detail & Related papers (2024-11-04T11:09:58Z) - Kalman Filter for Online Classification of Non-Stationary Data [101.26838049872651]
In Online Continual Learning (OCL) a learning system receives a stream of data and sequentially performs prediction and training steps.
We introduce a probabilistic Bayesian online learning model by using a neural representation and a state space model over the linear predictor weights.
In experiments in multi-class classification we demonstrate the predictive ability of the model and its flexibility to capture non-stationarity.
arXiv Detail & Related papers (2023-06-14T11:41:42Z) - Machine learning in and out of equilibrium [58.88325379746631]
Our study uses a Fokker-Planck approach, adapted from statistical physics, to explore these parallels.
We focus in particular on the stationary state of the system in the long-time limit, which in conventional SGD is out of equilibrium.
We propose a new variation of Langevin dynamics (SGLD) that harnesses without replacement minibatching.
arXiv Detail & Related papers (2023-06-06T09:12:49Z) - Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data.
In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z) - Rigorous dynamical mean field theory for stochastic gradient descent
methods [17.90683687731009]
We prove closed-form equations for the exact high-dimensionals of a family of first order gradient-based methods.
This includes widely used algorithms such as gradient descent (SGD) or Nesterov acceleration.
arXiv Detail & Related papers (2022-10-12T21:10:55Z) - Linearization and Identification of Multiple-Attractors Dynamical System
through Laplacian Eigenmaps [8.161497377142584]
We propose a Graph-based spectral clustering method that takes advantage of a velocity-augmented kernel to connect data-points belonging to the same dynamics.
We prove that there always exist a set of 2-dimensional embedding spaces in which the sub-dynamics are linear, and n-dimensional embedding where they are quasi-linear.
We learn a diffeomorphism from the Laplacian embedding space to the original space and show that the Laplacian embedding leads to good reconstruction accuracy and a faster training time.
arXiv Detail & Related papers (2022-02-18T12:43:25Z) - Online Stochastic Gradient Descent Learns Linear Dynamical Systems from
A Single Trajectory [1.52292571922932]
We show that if the unknown weight matrices describing the system are in Brunovsky canonical form, we can efficiently estimate the ground truth unknown of the system.
Specifically, by deriving concrete bounds, we show that SGD converges linearly in expectation to any arbitrary small Frobenius norm distance from the ground truth weights.
arXiv Detail & Related papers (2021-02-23T17:48:39Z) - ImitationFlow: Learning Deep Stable Stochastic Dynamic Systems by
Normalizing Flows [29.310742141970394]
We introduce ImitationFlow, a novel Deep generative model that allows learning complex globally stable, nonlinear dynamics.
We show the effectiveness of our method with both standard datasets and a real robot experiment.
arXiv Detail & Related papers (2020-10-25T14:49:46Z) - Stochastically forced ensemble dynamic mode decomposition for
forecasting and analysis of near-periodic systems [65.44033635330604]
We introduce a novel load forecasting method in which observed dynamics are modeled as a forced linear system.
We show that its use of intrinsic linear dynamics offers a number of desirable properties in terms of interpretability and parsimony.
Results are presented for a test case using load data from an electrical grid.
arXiv Detail & Related papers (2020-10-08T20:25:52Z) - Liquid Time-constant Networks [117.57116214802504]
We introduce a new class of time-continuous recurrent neural network models.
Instead of declaring a learning system's dynamics by implicit nonlinearities, we construct networks of linear first-order dynamical systems.
These neural networks exhibit stable and bounded behavior, yield superior expressivity within the family of neural ordinary differential equations.
arXiv Detail & Related papers (2020-06-08T09:53:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.