Hessian-Free High-Resolution Nesterov Acceleration for Sampling
- URL: http://arxiv.org/abs/2006.09230v4
- Date: Sat, 18 Jun 2022 01:50:00 GMT
- Title: Hessian-Free High-Resolution Nesterov Acceleration for Sampling
- Authors: Ruilin Li, Hongyuan Zha, Molei Tao
- Abstract summary: Nesterov's Accelerated Gradient (NAG) for optimization has better performance than its continuous time limit (noiseless kinetic Langevin) when a finite step-size is employed.
This work explores the sampling counterpart of this phenonemon and proposes a diffusion process, whose discretizations can yield accelerated gradient-based MCMC methods.
- Score: 55.498092486970364
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nesterov's Accelerated Gradient (NAG) for optimization has better performance
than its continuous time limit (noiseless kinetic Langevin) when a finite
step-size is employed \citep{shi2021understanding}. This work explores the
sampling counterpart of this phenonemon and proposes a diffusion process, whose
discretizations can yield accelerated gradient-based MCMC methods. More
precisely, we reformulate the optimizer of NAG for strongly convex functions
(NAG-SC) as a Hessian-Free High-Resolution ODE, change its high-resolution
coefficient to a hyperparameter, inject appropriate noise, and discretize the
resulting diffusion process. The acceleration effect of the new hyperparameter
is quantified and it is not an artificial one created by time-rescaling.
Instead, acceleration beyond underdamped Langevin in $W_2$ distance is
quantitatively established for log-strongly-concave-and-smooth targets, at both
the continuous dynamics level and the discrete algorithm level. Empirical
experiments in both log-strongly-concave and multi-modal cases also numerically
demonstrate this acceleration.
Related papers
- Gradient Normalization with(out) Clipping Ensures Convergence of Nonconvex SGD under Heavy-Tailed Noise with Improved Results [60.92029979853314]
This paper investigates Gradient Normalization without (NSGDC) its gradient reduction variant (NSGDC-VR)
We present significant improvements in the theoretical results for both algorithms.
arXiv Detail & Related papers (2024-10-21T22:40:42Z) - DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for
Accelerated Seq2Seq Diffusion Models [58.450152413700586]
We introduce a soft absorbing state that facilitates the diffusion model in learning to reconstruct discrete mutations based on the underlying Gaussian space.
We employ state-of-the-art ODE solvers within the continuous space to expedite the sampling process.
Our proposed method effectively accelerates the training convergence by 4x and generates samples of similar quality 800x faster.
arXiv Detail & Related papers (2023-10-09T15:29:10Z) - Convergence of mean-field Langevin dynamics: Time and space
discretization, stochastic gradient, and variance reduction [49.66486092259376]
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift.
Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures.
We provide a framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and gradient approximation.
arXiv Detail & Related papers (2023-06-12T16:28:11Z) - Accelerating Convergence in Global Non-Convex Optimization with
Reversible Diffusion [0.0]
Langevin Dynamics has been extensively in global non- optimization experiments.
Our proposed method is used to investigate the trade-off between speed and discretization error.
arXiv Detail & Related papers (2023-05-19T07:49:40Z) - Revisiting the acceleration phenomenon via high-resolution differential
equations [6.53306151979817]
Nesterov's accelerated gradient descent (NAG) is one of the milestones in the history of first-order algorithms.
We investigate NAG for the $mu$-strongly convex function based on the techniques of Lyapunov analysis and phase-space representation.
We find that the high-resolution differential equation framework from the implicit-velocity scheme of NAG is perfect and outperforms the gradient-correction scheme.
arXiv Detail & Related papers (2022-12-12T04:36:37Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - A modified limited memory Nesterov's accelerated quasi-Newton [0.0]
The Nesterov's accelerated quasi-Newton (L)NAQ method has shown to accelerate the conventional (L)BFGS quasi-Newton method.
The Momentum accelerated Quasi-Newton (MoQ) method showed that the Nesterov's accelerated gradient can be approximated as a linear combination of past gradients.
This abstract extends the MoQ approximation to limited memory NAQ and evaluates the performance on a function approximation problem.
arXiv Detail & Related papers (2021-12-01T06:48:47Z) - Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC
via Variance Reduction [24.794221009364772]
We study the reduction for a noisy energy estimators variance, which promotes much more effective analysis.
We obtain the state-of-the-art results in optimization and uncertainty estimates for synthetic experiments and image data.
arXiv Detail & Related papers (2020-10-02T16:23:35Z) - Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient
Clipping [69.9674326582747]
We propose a new accelerated first-order method called clipped-SSTM for smooth convex optimization with heavy-tailed distributed noise in gradients.
We prove new complexity that outperform state-of-the-art results in this case.
We derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
arXiv Detail & Related papers (2020-05-21T17:05:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.