Stochasticity of Deterministic Gradient Descent: Large Learning Rate for
Multiscale Objective Function
- URL: http://arxiv.org/abs/2002.06189v2
- Date: Mon, 2 Nov 2020 16:37:14 GMT
- Title: Stochasticity of Deterministic Gradient Descent: Large Learning Rate for
Multiscale Objective Function
- Authors: Lingkai Kong and Molei Tao
- Abstract summary: This article suggests that deterministic Gradient Descent, which does not use any approximation, can still exhibit behaviors.
It shows that if the objective function exhibit multiscale behaviors, then in a large learning rate regime which only resolves the macroscopic but not the microscopic details of the objective, the deterministic GD dynamics can become chaotic.
- Score: 14.46779433267854
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This article suggests that deterministic Gradient Descent, which does not use
any stochastic gradient approximation, can still exhibit stochastic behaviors.
In particular, it shows that if the objective function exhibit multiscale
behaviors, then in a large learning rate regime which only resolves the
macroscopic but not the microscopic details of the objective, the deterministic
GD dynamics can become chaotic and convergent not to a local minimizer but to a
statistical distribution. A sufficient condition is also established for
approximating this long-time statistical limit by a rescaled Gibbs
distribution. Both theoretical and numerical demonstrations are provided, and
the theoretical part relies on the construction of a stochastic map that uses
bounded noise (as opposed to discretized diffusions).
Related papers
- Non-asymptotic bounds for forward processes in denoising diffusions: Ornstein-Uhlenbeck is hard to beat [49.1574468325115]
This paper presents explicit non-asymptotic bounds on the forward diffusion error in total variation (TV)
We parametrise multi-modal data distributions in terms of the distance $R$ to their furthest modes and consider forward diffusions with additive and multiplicative noise.
arXiv Detail & Related papers (2024-08-25T10:28:31Z) - User-defined Event Sampling and Uncertainty Quantification in Diffusion
Models for Physical Dynamical Systems [49.75149094527068]
We show that diffusion models can be adapted to make predictions and provide uncertainty quantification for chaotic dynamical systems.
We develop a probabilistic approximation scheme for the conditional score function which converges to the true distribution as the noise level decreases.
We are able to sample conditionally on nonlinear userdefined events at inference time, and matches data statistics even when sampling from the tails of the distribution.
arXiv Detail & Related papers (2023-06-13T03:42:03Z) - Interacting Particle Langevin Algorithm for Maximum Marginal Likelihood
Estimation [2.53740603524637]
We develop a class of interacting particle systems for implementing a maximum marginal likelihood estimation procedure.
In particular, we prove that the parameter marginal of the stationary measure of this diffusion has the form of a Gibbs measure.
Using a particular rescaling, we then prove geometric ergodicity of this system and bound the discretisation error.
in a manner that is uniform in time and does not increase with the number of particles.
arXiv Detail & Related papers (2023-03-23T16:50:08Z) - Monte Carlo Neural PDE Solver for Learning PDEs via Probabilistic Representation [59.45669299295436]
We propose a Monte Carlo PDE solver for training unsupervised neural solvers.
We use the PDEs' probabilistic representation, which regards macroscopic phenomena as ensembles of random particles.
Our experiments on convection-diffusion, Allen-Cahn, and Navier-Stokes equations demonstrate significant improvements in accuracy and efficiency.
arXiv Detail & Related papers (2023-02-10T08:05:19Z) - Mathematical analysis of singularities in the diffusion model under the submanifold assumption [0.0]
The drift term of the backward sampling process is represented as a conditional expectation involving the data distribution and the forward diffusion.
The training process aims to find such a drift function by minimizing the mean-squared residue related to the conditional expectation.
We show that the analytical mean drift function in DDPM and the score function in SGMally blow up in the final stages of the sampling process for singular data distributions.
arXiv Detail & Related papers (2023-01-19T05:13:03Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - Stochastic Langevin Differential Inclusions with Applications to Machine Learning [5.274477003588407]
We show some foundational results regarding the flow and properties of Langevin-type Differential Inclusions.
In particular, we show strong existence of the solution, as well as an canonical- minimization of the free-energy functional.
arXiv Detail & Related papers (2022-06-23T08:29:17Z) - Concentration analysis of multivariate elliptic diffusion processes [0.0]
We prove concentration inequalities and associated PAC bounds for continuous- and discrete-time additive functionals.
Our analysis relies on an approach via the Poisson equation allowing us to consider a very broad class of subexponentially ergodic processes.
arXiv Detail & Related papers (2022-06-07T14:15:05Z) - Efficient CDF Approximations for Normalizing Flows [64.60846767084877]
We build upon the diffeomorphic properties of normalizing flows to estimate the cumulative distribution function (CDF) over a closed region.
Our experiments on popular flow architectures and UCI datasets show a marked improvement in sample efficiency as compared to traditional estimators.
arXiv Detail & Related papers (2022-02-23T06:11:49Z) - Noise and Fluctuation of Finite Learning Rate Stochastic Gradient
Descent [3.0079490585515343]
gradient descent (SGD) is relatively well understood in the vanishing learning rate regime.
We propose to study the basic properties of SGD and its variants in the non-vanishing learning rate regime.
arXiv Detail & Related papers (2020-12-07T12:31:43Z) - Fast approximations in the homogeneous Ising model for use in scene
analysis [61.0951285821105]
We provide accurate approximations that make it possible to numerically calculate quantities needed in inference.
We show that our approximation formulae are scalable and unfazed by the size of the Markov Random Field.
The practical import of our approximation formulae is illustrated in performing Bayesian inference in a functional Magnetic Resonance Imaging activation detection experiment, and also in likelihood ratio testing for anisotropy in the spatial patterns of yearly increases in pistachio tree yields.
arXiv Detail & Related papers (2017-12-06T14:24:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.