Related papers: Tuning-Free Sampling via Optimization on the Space of Probability Measures

Tuning-Free Sampling via Optimization on the Space of Probability Measures

URL: http://arxiv.org/abs/2510.25315v1
Date: Wed, 29 Oct 2025 09:29:39 GMT
Title: Tuning-Free Sampling via Optimization on the Space of Probability Measures
Authors: Louis Sharrock, Christopher Nemeth,
Abstract summary: tuning-free sampling algorithms for gradient-based sampling algorithms obtained as time-discretizations of Wasserstein gradient flows.<n>We establish strong theoretical guarantees for our approach. In particular, we recover the convergence rate of optimally tuned versions of these algorithms up to logarithmic factors.<n>Across a variety of tasks, our algorithms achieve similar performance to the optimal performance of existing algorithms, with no need to tune a step size parameter.
Score: 14.429970585649945
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce adaptive, tuning-free step size schedules for gradient-based sampling algorithms obtained as time-discretizations of Wasserstein gradient flows. The result is a suite of tuning-free sampling algorithms, including tuning-free variants of the unadjusted Langevin algorithm (ULA), stochastic gradient Langevin dynamics (SGLD), mean-field Langevin dynamics (MFLD), Stein variational gradient descent (SVGD), and variational gradient descent (VGD). More widely, our approach yields tuning-free algorithms for solving a broad class of stochastic optimization problems over the space of probability measures. Under mild assumptions (e.g., geodesic convexity and locally bounded stochastic gradients), we establish strong theoretical guarantees for our approach. In particular, we recover the convergence rate of optimally tuned versions of these algorithms up to logarithmic factors, in both nonsmooth and smooth settings. We then benchmark the performance of our methods against comparable existing approaches. Across a variety of tasks, our algorithms achieve similar performance to the optimal performance of existing algorithms, with no need to tune a step size parameter.

Related papers

Gradient Normalization Provably Benefits Nonconvex SGD under Heavy-Tailed Noise [60.92029979853314]
We investigate the roles of gradient normalization and clipping in ensuring the convergence of Gradient Descent (SGD) under heavy-tailed noise. Our work provides the first theoretical evidence demonstrating the benefits of gradient normalization in SGD under heavy-tailed noise. We introduce an accelerated SGD variant incorporating gradient normalization and clipping, further enhancing convergence rates under heavy-tailed noise.
arXiv Detail & Related papers (2024-10-21T22:40:42Z)
Dynamic Anisotropic Smoothing for Noisy Derivative-Free Optimization [0.0]
We propose a novel algorithm that extends the methods of ball smoothing and Gaussian smoothing for noisy derivative-free optimization. The algorithm dynamically adapts the shape of the smoothing kernel to approximate the Hessian of the objective function around a local optimum.
arXiv Detail & Related papers (2024-05-02T21:04:20Z)
Differentially Private Optimization with Sparse Gradients [60.853074897282625]
We study differentially private (DP) optimization problems under sparsity of individual gradients. Building on this, we obtain pure- and approximate-DP algorithms with almost optimal rates for convex optimization with sparse gradients.
arXiv Detail & Related papers (2024-04-16T20:01:10Z)
Stochastic Ratios Tracking Algorithm for Large Scale Machine Learning Problems [0.7614628596146599]
We propose a novel algorithm for adaptive step length selection in the classical SGD framework. Under reasonable conditions, the algorithm produces step lengths in line with well-established theoretical requirements. We show that the algorithm can generate step lengths comparable to the best step length obtained from manual tuning.
arXiv Detail & Related papers (2023-05-17T06:22:11Z)
Differentiable Annealed Importance Sampling and the Perils of Gradient Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective. We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z)
High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability. Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level. We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z)
Adaptive Importance Sampling for Finite-Sum Optimization and Sampling with Decreasing Step-Sizes [4.355567556995855]
We propose Avare, a simple and efficient algorithm for adaptive importance sampling for finite-sum optimization and sampling with decreasing step-sizes. Under standard technical conditions, we show that Avare achieves $mathcalO(T2/3)$ and $mathcalO(T5/6)$ dynamic regret for SGD and SGLD respectively when run with $mathcalO(T5/6)$ step sizes.
arXiv Detail & Related papers (2021-03-23T00:28:15Z)
Sequential Subspace Search for Functional Bayesian Optimization Incorporating Experimenter Intuition [63.011641517977644]
Our algorithm generates a sequence of finite-dimensional random subspaces of functional space spanned by a set of draws from the experimenter's Gaussian Process. Standard Bayesian optimisation is applied on each subspace, and the best solution found used as a starting point (origin) for the next subspace. We test our algorithm in simulated and real-world experiments, namely blind function matching, finding the optimal precipitation-strengthening function for an aluminium alloy, and learning rate schedule optimisation for deep networks.
arXiv Detail & Related papers (2020-09-08T06:54:11Z)
An adaptive stochastic gradient-free approach for high-dimensional blackbox optimization [0.0]
We propose an adaptive gradient-free (ASGF) approach for high-dimensional non-smoothing problems. We illustrate the performance of this method on benchmark global problems and learning tasks.
arXiv Detail & Related papers (2020-06-18T22:47:58Z)
Convergence of adaptive algorithms for weakly convex constrained optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope. Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
Non-asymptotic bounds for stochastic optimization with biased noisy gradient oracles [8.655294504286635]
We introduce biased gradient oracles to capture a setting where the function measurements have an estimation error. Our proposed oracles are in practical contexts, for instance, risk measure estimation from a batch of independent and identically distributed simulation.
arXiv Detail & Related papers (2020-02-26T12:53:04Z)
Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization [71.03797261151605]
Adaptivity is an important yet under-studied property in modern optimization theory. Our algorithm is proved to achieve the best-available convergence for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
arXiv Detail & Related papers (2020-02-13T05:42:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.