Related papers: Hessian-Free High-Resolution Nesterov Acceleration for Sampling

Hessian-Free High-Resolution Nesterov Acceleration for Sampling

URL: http://arxiv.org/abs/2006.09230v4
Date: Sat, 18 Jun 2022 01:50:00 GMT
Title: Hessian-Free High-Resolution Nesterov Acceleration for Sampling
Authors: Ruilin Li, Hongyuan Zha, Molei Tao
Abstract summary: Nesterov's Accelerated Gradient (NAG) for optimization has better performance than its continuous time limit (noiseless kinetic Langevin) when a finite step-size is employed. This work explores the sampling counterpart of this phenonemon and proposes a diffusion process, whose discretizations can yield accelerated gradient-based MCMC methods.
Score: 55.498092486970364
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Nesterov's Accelerated Gradient (NAG) for optimization has better performance than its continuous time limit (noiseless kinetic Langevin) when a finite step-size is employed \citep{shi2021understanding}. This work explores the sampling counterpart of this phenonemon and proposes a diffusion process, whose discretizations can yield accelerated gradient-based MCMC methods. More precisely, we reformulate the optimizer of NAG for strongly convex functions (NAG-SC) as a Hessian-Free High-Resolution ODE, change its high-resolution coefficient to a hyperparameter, inject appropriate noise, and discretize the resulting diffusion process. The acceleration effect of the new hyperparameter is quantified and it is not an artificial one created by time-rescaling. Instead, acceleration beyond underdamped Langevin in $W_2$ distance is quantitatively established for log-strongly-concave-and-smooth targets, at both the continuous dynamics level and the discrete algorithm level. Empirical experiments in both log-strongly-concave and multi-modal cases also numerically demonstrate this acceleration.

Related papers

More Optimal Fractional-Order Stochastic Gradient Descent for Non-Convex Optimization Problems [2.5971517743176915]
We propose 2SED Fractional-Order Gradient Descent (2FOSGD), which integrates the Two-Scale Effective Dimension (2SED) with FOSGD.<n>By tracking sensitivity and effective dimensionality, 2SEDFOSGD dynamically modulates the exponent to mitigate sluggish oscillations and hasten convergence.
arXiv Detail & Related papers (2025-05-05T19:27:36Z)
Effective Dimension Aware Fractional-Order Stochastic Gradient Descent for Convex Optimization Problems [2.5971517743176915]
We introduce 2SED Fractional-Order Gradient Descent (2SEDFOSGD), a novel method that automatically enhances the fractional exponent in a data-driven manner. We demonstrate how this dimension-aware adaptation retains the benefits of fractional memory while averting the sluggish or unstable behaviors frequently observed in naive fractional SGD.
arXiv Detail & Related papers (2025-03-17T22:57:37Z)
Propagation of Chaos for Mean-Field Langevin Dynamics and its Application to Model Ensemble [36.19164064733151]
Mean-field Langevin dynamics (MFLD) is an optimization method derived by taking the mean-field limit of noisy gradient descent for two-layer neural networks. Recent work shows that the approximation error due to finite particles remains uniform in time and diminishes as the number of particles increases. In this paper, we establish an improved PoC result for MFLD, which removes the exponential dependence on the regularization coefficient from the particle approximation term.
arXiv Detail & Related papers (2025-02-09T05:58:46Z)
DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for Accelerated Seq2Seq Diffusion Models [58.450152413700586]
We introduce a soft absorbing state that facilitates the diffusion model in learning to reconstruct discrete mutations based on the underlying Gaussian space. We employ state-of-the-art ODE solvers within the continuous space to expedite the sampling process. Our proposed method effectively accelerates the training convergence by 4x and generates samples of similar quality 800x faster.
arXiv Detail & Related papers (2023-10-09T15:29:10Z)
Convergence of mean-field Langevin dynamics: Time and space discretization, stochastic gradient, and variance reduction [49.66486092259376]
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift. Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures. We provide a framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and gradient approximation.
arXiv Detail & Related papers (2023-06-12T16:28:11Z)
Fast Diffusion Model [122.36693015093041]
Diffusion models (DMs) have been adopted across diverse fields with their abilities in capturing intricate data distributions. In this paper, we propose a Fast Diffusion Model (FDM) to significantly speed up DMs from a DM optimization perspective.
arXiv Detail & Related papers (2023-06-12T09:38:04Z)
Accelerating Convergence in Global Non-Convex Optimization with Reversible Diffusion [0.0]
Langevin Dynamics has been extensively in global non- optimization experiments. Our proposed method is used to investigate the trade-off between speed and discretization error.
arXiv Detail & Related papers (2023-05-19T07:49:40Z)
Revisiting the acceleration phenomenon via high-resolution differential equations [6.53306151979817]
Nesterov's accelerated gradient descent (NAG) is one of the milestones in the history of first-order algorithms. We investigate NAG for the $mu$-strongly convex function based on the techniques of Lyapunov analysis and phase-space representation. We find that the high-resolution differential equation framework from the implicit-velocity scheme of NAG is perfect and outperforms the gradient-correction scheme.
arXiv Detail & Related papers (2022-12-12T04:36:37Z)
Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems. In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z)
A modified limited memory Nesterov's accelerated quasi-Newton [0.0]
The Nesterov's accelerated quasi-Newton (L)NAQ method has shown to accelerate the conventional (L)BFGS quasi-Newton method. The Momentum accelerated Quasi-Newton (MoQ) method showed that the Nesterov's accelerated gradient can be approximated as a linear combination of past gradients. This abstract extends the MoQ approximation to limited memory NAQ and evaluates the performance on a function approximation problem.
arXiv Detail & Related papers (2021-12-01T06:48:47Z)
Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC via Variance Reduction [24.794221009364772]
We study the reduction for a noisy energy estimators variance, which promotes much more effective analysis. We obtain the state-of-the-art results in optimization and uncertainty estimates for synthetic experiments and image data.
arXiv Detail & Related papers (2020-10-02T16:23:35Z)
Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping [69.9674326582747]
We propose a new accelerated first-order method called clipped-SSTM for smooth convex optimization with heavy-tailed distributed noise in gradients. We prove new complexity that outperform state-of-the-art results in this case. We derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
arXiv Detail & Related papers (2020-05-21T17:05:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.