Rigorous dynamical mean field theory for stochastic gradient descent
methods
- URL: http://arxiv.org/abs/2210.06591v3
- Date: Wed, 29 Nov 2023 15:00:42 GMT
- Title: Rigorous dynamical mean field theory for stochastic gradient descent
methods
- Authors: Cedric Gerbelot, Emanuele Troiani, Francesca Mignacco, Florent
Krzakala and Lenka Zdeborova
- Abstract summary: We prove closed-form equations for the exact high-dimensionals of a family of first order gradient-based methods.
This includes widely used algorithms such as gradient descent (SGD) or Nesterov acceleration.
- Score: 17.90683687731009
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We prove closed-form equations for the exact high-dimensional asymptotics of
a family of first order gradient-based methods, learning an estimator (e.g.
M-estimator, shallow neural network, ...) from observations on Gaussian data
with empirical risk minimization. This includes widely used algorithms such as
stochastic gradient descent (SGD) or Nesterov acceleration. The obtained
equations match those resulting from the discretization of dynamical mean-field
theory (DMFT) equations from statistical physics when applied to gradient flow.
Our proof method allows us to give an explicit description of how memory
kernels build up in the effective dynamics, and to include non-separable update
functions, allowing datasets with non-identity covariance matrices. Finally, we
provide numerical implementations of the equations for SGD with generic
extensive batch-size and with constant learning rates.
Related papers
- DynGMA: a robust approach for learning stochastic differential equations from data [13.858051019755283]
We introduce novel approximations to the transition density of the parameterized SDE.
Our method exhibits superior accuracy compared to baseline methods in learning the fully unknown drift diffusion functions.
It is capable of handling data with low time resolution and variable, even uncontrollable, time step sizes.
arXiv Detail & Related papers (2024-02-22T12:09:52Z) - Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood
Estimation for Latent Gaussian Models [69.22568644711113]
We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversions.
Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation.
In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
arXiv Detail & Related papers (2023-06-05T21:08:34Z) - Score-based Diffusion Models in Function Space [140.792362459734]
Diffusion models have recently emerged as a powerful framework for generative modeling.
We introduce a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space.
We show that the corresponding discretized algorithm generates accurate samples at a fixed cost independent of the data resolution.
arXiv Detail & Related papers (2023-02-14T23:50:53Z) - Extracting Stochastic Governing Laws by Nonlocal Kramers-Moyal Formulas [3.8325907381729496]
We propose a data-driven approach to extract governing laws with both (Gaussian) Brownian motion and (non-Gaussian) L'evy motion.
We demonstrate that this approach can learn the differential equation with L'evy motion.
arXiv Detail & Related papers (2021-08-28T04:56:51Z) - Large-Scale Wasserstein Gradient Flows [84.73670288608025]
We introduce a scalable scheme to approximate Wasserstein gradient flows.
Our approach relies on input neural networks (ICNNs) to discretize the JKO steps.
As a result, we can sample from the measure at each step of the gradient diffusion and compute its density.
arXiv Detail & Related papers (2021-06-01T19:21:48Z) - Data Assimilation Networks [1.5545257664210517]
Data assimilation aims at forecasting the state of a dynamical system by combining a mathematical representation of the system with noisy observations.
We propose a fully data driven deep learning architecture generalizing recurrent Elman networks and data assimilation algorithms.
Our architecture achieves comparable performance to EnKF on both the analysis and the propagation of probability density functions of the system state at a given time without using any explicit regularization technique.
arXiv Detail & Related papers (2020-10-19T17:35:36Z) - Dynamical mean-field theory for stochastic gradient descent in Gaussian
mixture classification [25.898873960635534]
We analyze in a closed learning dynamics of gradient descent (SGD) for a single-layer neural network classifying a high-dimensional landscape.
We define a prototype process for which can be extended to a continuous-dimensional gradient flow.
In the full-batch limit, we recover the standard gradient flow.
arXiv Detail & Related papers (2020-06-10T22:49:41Z) - A Data-Driven Approach for Discovering Stochastic Dynamical Systems with
Non-Gaussian Levy Noise [5.17900889163564]
We develop a new data-driven approach to extract governing laws from noisy data sets.
First, we establish a feasible theoretical framework, by expressing the drift coefficient, diffusion coefficient and jump measure.
We then design a numerical algorithm to compute the drift, diffusion coefficient and jump measure, and thus extract a governing equation with Gaussian and non-Gaussian noise.
arXiv Detail & Related papers (2020-05-07T21:29:17Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z) - A Near-Optimal Gradient Flow for Learning Neural Energy-Based Models [93.24030378630175]
We propose a novel numerical scheme to optimize the gradient flows for learning energy-based models (EBMs)
We derive a second-order Wasserstein gradient flow of the global relative entropy from Fokker-Planck equation.
Compared with existing schemes, Wasserstein gradient flow is a smoother and near-optimal numerical scheme to approximate real data densities.
arXiv Detail & Related papers (2019-10-31T02:26:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.