High-accuracy log-concave sampling with stochastic queries
- URL: http://arxiv.org/abs/2602.14342v1
- Date: Sun, 15 Feb 2026 23:19:07 GMT
- Title: High-accuracy log-concave sampling with stochastic queries
- Authors: Fan Chen, Sinho Chewi, Constantinos Daskalakis, Alexander Rakhlin,
- Abstract summary: We show that high-accuracy guarantees for log-concave sampling are achievable using iteration gradients with subexponential tails.<n>Our framework also provides similar high accuracy guarantees under zeroth order (value) queries.
- Score: 70.90863485771405
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We show that high-accuracy guarantees for log-concave sampling -- that is, iteration and query complexities which scale as $\mathrm{poly}\log(1/δ)$, where $δ$ is the desired target accuracy -- are achievable using stochastic gradients with subexponential tails. Notably, this exhibits a separation with the problem of convex optimization, where stochasticity (even additive Gaussian noise) in the gradient oracle incurs $\mathrm{poly}(1/δ)$ queries. We also give an information-theoretic argument that light-tailed stochastic gradients are necessary for high accuracy: for example, in the bounded variance case, we show that the minimax-optimal query complexity scales as $Θ(1/δ)$. Our framework also provides similar high accuracy guarantees under stochastic zeroth order (value) queries.
Related papers
- Gradient-free stochastic optimization for additive models [50.57026826740147]
We address the problem of zero-order optimization from noisy observations for an objective function satisfying the Polyak-Lojasiewicz or the strong convexity condition.<n>We assume that the objective function has an additive structure and satisfies a higher-order smoothness property, characterized by the H"older family of functions.
arXiv Detail & Related papers (2025-03-03T23:39:08Z) - Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density.
We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order.
We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z) - A randomized algorithm for nonconvex minimization with inexact evaluations and complexity guarantees [7.08249229857925]
We consider minimization of a smooth non oracle function with inexact to gradient Hessian.
A novel feature of our method is that if an approximate direction of negative curvature is chosen, we choose a sense relax to be negative with equal gradients.
arXiv Detail & Related papers (2023-10-28T22:57:56Z) - Robust Stochastic Optimization via Gradient Quantile Clipping [6.2844649973308835]
We introduce a quant clipping strategy for Gradient Descent (SGD)
We use gradient new outliers as norm clipping chains.
We propose an implementation of the algorithm using Huberiles.
arXiv Detail & Related papers (2023-09-29T15:24:48Z) - Sampling from Gaussian Process Posteriors using Stochastic Gradient
Descent [43.097493761380186]
gradient algorithms are an efficient method of approximately solving linear systems.
We show that gradient descent produces accurate predictions, even in cases where it does not converge quickly to the optimum.
Experimentally, gradient descent achieves state-of-the-art performance on sufficiently large-scale or ill-conditioned regression tasks.
arXiv Detail & Related papers (2023-06-20T15:07:37Z) - Zero-Order One-Point Estimate with Distributed Stochastic
Gradient-Tracking Technique [23.63073074337495]
We consider a distributed multi-agent optimization problem where each agent holds a local objective function that is smooth and convex.
We extend the distributed gradient-tracking method to the bandit setting where we don't have an estimate of the gradient.
We analyze the convergence of this novel technique for smooth and convex objectives using approximation tools.
arXiv Detail & Related papers (2022-10-11T17:04:45Z) - Convergence of First-Order Methods for Constrained Nonconvex
Optimization with Dependent Data [7.513100214864646]
We show the worst-case complexity of convergence $tildeO(t-1/4)$ and MoreautildeO(vareps-4)$ for smooth non- optimization.
We obtain first online nonnegative matrix factorization algorithms for dependent data based on projected gradient methods with adaptive step sizes and optimal convergence.
arXiv Detail & Related papers (2022-03-29T17:59:10Z) - High-probability Bounds for Non-Convex Stochastic Optimization with
Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails.
We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Faster Convergence of Stochastic Gradient Langevin Dynamics for
Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave.
At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z) - Improved Complexities for Stochastic Conditional Gradient Methods under
Interpolation-like Conditions [12.096252285460814]
We show that the conditional gradient method requires $mathcalO(epsilon-1.5)$ calls to the gradient oracle to find an $epsilon$-optimal solution.
By including a gradient sliding step, we show that the number of calls reduces to $mathcalO(epsilon-1.5)$.
arXiv Detail & Related papers (2020-06-15T06:51:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.