Stochastic Gradient Flow Dynamics of Test Risk and its Exact Solution for Weak Features
- URL: http://arxiv.org/abs/2402.07626v2
- Date: Mon, 10 Jun 2024 10:25:14 GMT
- Title: Stochastic Gradient Flow Dynamics of Test Risk and its Exact Solution for Weak Features
- Authors: Rodrigo Veiga, Anastasia Remizova, Nicolas Macris,
- Abstract summary: We provide a formula for computing the difference between test risk curves of pure gradient and gradient flows.
We explicitly compute the corrections brought about by the added term in the dynamics.
The analytical results are compared to simulations of discrete-time gradient descent and show good agreement.
- Score: 8.645858565518155
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate the test risk of continuous-time stochastic gradient flow dynamics in learning theory. Using a path integral formulation we provide, in the regime of a small learning rate, a general formula for computing the difference between test risk curves of pure gradient and stochastic gradient flows. We apply the general theory to a simple model of weak features, which displays the double descent phenomenon, and explicitly compute the corrections brought about by the added stochastic term in the dynamics, as a function of time and model parameters. The analytical results are compared to simulations of discrete-time stochastic gradient descent and show good agreement.
Related papers
- Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - One-step corrected projected stochastic gradient descent for statistical estimation [49.1574468325115]
It is based on the projected gradient descent on the log-likelihood function corrected by a single step of the Fisher scoring algorithm.
We show theoretically and by simulations that it is an interesting alternative to the usual gradient descent with averaging or the adaptative gradient descent.
arXiv Detail & Related papers (2023-06-09T13:43:07Z) - Reservoir Computing with Error Correction: Long-term Behaviors of
Stochastic Dynamical Systems [5.815325960286111]
We propose a data-driven framework combining Reservoir Computing and Normalizing Flow to study this issue.
We verify the effectiveness of the proposed framework in several experiments, including the Van der Pal, El Nino-Southern Oscillation simplified model, and Lorenz system.
arXiv Detail & Related papers (2023-05-01T05:50:17Z) - Stochastic Modified Flows, Mean-Field Limits and Dynamics of Stochastic
Gradient Descent [1.2031796234206138]
We propose new limiting dynamics for gradient descent in the small learning rate regime called modified flows.
These SDEs are driven by a cylindrical Brownian motion and improve the so-called modified equations by having regular diffusion coefficients and by matching the multi-point statistics.
arXiv Detail & Related papers (2023-02-14T15:33:59Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - Rigorous dynamical mean field theory for stochastic gradient descent
methods [17.90683687731009]
We prove closed-form equations for the exact high-dimensionals of a family of first order gradient-based methods.
This includes widely used algorithms such as gradient descent (SGD) or Nesterov acceleration.
arXiv Detail & Related papers (2022-10-12T21:10:55Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - ImitationFlow: Learning Deep Stable Stochastic Dynamic Systems by
Normalizing Flows [29.310742141970394]
We introduce ImitationFlow, a novel Deep generative model that allows learning complex globally stable, nonlinear dynamics.
We show the effectiveness of our method with both standard datasets and a real robot experiment.
arXiv Detail & Related papers (2020-10-25T14:49:46Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z) - Dynamical mean-field theory for stochastic gradient descent in Gaussian
mixture classification [25.898873960635534]
We analyze in a closed learning dynamics of gradient descent (SGD) for a single-layer neural network classifying a high-dimensional landscape.
We define a prototype process for which can be extended to a continuous-dimensional gradient flow.
In the full-batch limit, we recover the standard gradient flow.
arXiv Detail & Related papers (2020-06-10T22:49:41Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.