Analysis of Stochastic Gradient Descent in Continuous Time
- URL: http://arxiv.org/abs/2004.07177v3
- Date: Sun, 10 Jan 2021 14:07:57 GMT
- Title: Analysis of Stochastic Gradient Descent in Continuous Time
- Authors: Jonas Latz
- Abstract summary: We introduce the gradient process as a continuous-time representation of gradient descent.
We show that it converges weakly to the gradient flow as the learning rate approaches zero.
In this case, the process converges weakly to the point mass concentrated in the global minimum of the full target function.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic gradient descent is an optimisation method that combines classical
gradient descent with random subsampling within the target functional. In this
work, we introduce the stochastic gradient process as a continuous-time
representation of stochastic gradient descent. The stochastic gradient process
is a dynamical system that is coupled with a continuous-time Markov process
living on a finite state space. The dynamical system -- a gradient flow --
represents the gradient descent part, the process on the finite state space
represents the random subsampling. Processes of this type are, for instance,
used to model clonal populations in fluctuating environments. After introducing
it, we study theoretical properties of the stochastic gradient process: We show
that it converges weakly to the gradient flow with respect to the full target
function, as the learning rate approaches zero. We give conditions under which
the stochastic gradient process with constant learning rate is exponentially
ergodic in the Wasserstein sense. Then we study the case, where the learning
rate goes to zero sufficiently slowly and the single target functions are
strongly convex. In this case, the process converges weakly to the point mass
concentrated in the global minimum of the full target function; indicating
consistency of the method. We conclude after a discussion of discretisation
strategies for the stochastic gradient process and numerical experiments.
Related papers
- Limit Theorems for Stochastic Gradient Descent with Infinite Variance [47.87144151929621]
We show that the gradient descent algorithm can be characterized as the stationary distribution of a suitably defined Ornstein-rnstein process driven by an appropriate L'evy process.
We also explore the applications of these results in linear regression and logistic regression models.
arXiv Detail & Related papers (2024-10-21T09:39:10Z) - Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - One-step corrected projected stochastic gradient descent for statistical estimation [49.1574468325115]
It is based on the projected gradient descent on the log-likelihood function corrected by a single step of the Fisher scoring algorithm.
We show theoretically and by simulations that it is an interesting alternative to the usual gradient descent with averaging or the adaptative gradient descent.
arXiv Detail & Related papers (2023-06-09T13:43:07Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - Continuous-Time Meta-Learning with Forward Mode Differentiation [65.26189016950343]
We introduce Continuous Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field.
Treating the learning process as an ODE offers the notable advantage that the length of the trajectory is now continuous.
We show empirically its efficiency in terms of runtime and memory usage, and we illustrate its effectiveness on a range of few-shot image classification problems.
arXiv Detail & Related papers (2022-03-02T22:35:58Z) - A Continuous-time Stochastic Gradient Descent Method for Continuous Data [0.0]
We study a continuous-time variant of the gradient descent algorithm for optimization problems with continuous data.
We study multiple sampling patterns for the continuous data space and allow for data simulated or streamed at runtime.
We end with illustrating the applicability of the gradient process in a regression problem with noisy functional data, as well as in a physics-informed neural network.
arXiv Detail & Related papers (2021-12-07T15:09:24Z) - Unbiased Estimation of the Gradient of the Log-Likelihood for a Class of
Continuous-Time State-Space Models [0.0]
We consider static parameter estimation for a class of continuous-time state-space models.
Our goal is to obtain an unbiased estimate of the gradient of the log-likelihood (score function)
arXiv Detail & Related papers (2021-05-24T20:31:48Z) - Convergence of Gradient Algorithms for Nonconvex $C^{1+\alpha}$ Cost
Functions [2.538209532048867]
We show a byproduct that a gradient converges and provide an explicit upper bound on the convergence rate.
This allows us to prove the almost sure convergence by Doobmartin's supergale convergence theorem.
arXiv Detail & Related papers (2020-12-01T16:48:59Z) - Understanding and Detecting Convergence for Stochastic Gradient Descent
with Momentum [18.88380216480131]
This paper considers gradient descent with a constant learning rate and momentum.
We construct a statistical diagnostic test for convergence to the stationary phase using the inner product between successive gradients.
arXiv Detail & Related papers (2020-08-27T16:24:18Z) - A Near-Optimal Gradient Flow for Learning Neural Energy-Based Models [93.24030378630175]
We propose a novel numerical scheme to optimize the gradient flows for learning energy-based models (EBMs)
We derive a second-order Wasserstein gradient flow of the global relative entropy from Fokker-Planck equation.
Compared with existing schemes, Wasserstein gradient flow is a smoother and near-optimal numerical scheme to approximate real data densities.
arXiv Detail & Related papers (2019-10-31T02:26:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.