Convergence Rates for Non-Log-Concave Sampling and Log-Partition
Estimation
- URL: http://arxiv.org/abs/2303.03237v3
- Date: Tue, 1 Aug 2023 13:09:53 GMT
- Title: Convergence Rates for Non-Log-Concave Sampling and Log-Partition
Estimation
- Authors: David Holzm\"uller, Francis Bach
- Abstract summary: It is known that for $m$times differentiable functions in $d$, the optimal rate for algorithms with $n$(m/d) is to be $n(m/d).
We show that similar rates for sampling and computation are possible, and whether they can be realized in time with an independent rate of $d$.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sampling from Gibbs distributions $p(x) \propto \exp(-V(x)/\varepsilon)$ and
computing their log-partition function are fundamental tasks in statistics,
machine learning, and statistical physics. However, while efficient algorithms
are known for convex potentials $V$, the situation is much more difficult in
the non-convex case, where algorithms necessarily suffer from the curse of
dimensionality in the worst case. For optimization, which can be seen as a
low-temperature limit of sampling, it is known that smooth functions $V$ allow
faster convergence rates. Specifically, for $m$-times differentiable functions
in $d$ dimensions, the optimal rate for algorithms with $n$ function
evaluations is known to be $O(n^{-m/d})$, where the constant can potentially
depend on $m, d$ and the function to be optimized. Hence, the curse of
dimensionality can be alleviated for smooth functions at least in terms of the
convergence rate. Recently, it has been shown that similarly fast rates can
also be achieved with polynomial runtime $O(n^{3.5})$, where the exponent $3.5$
is independent of $m$ or $d$. Hence, it is natural to ask whether similar rates
for sampling and log-partition computation are possible, and whether they can
be realized in polynomial time with an exponent independent of $m$ and $d$. We
show that the optimal rates for sampling and log-partition computation are
sometimes equal and sometimes faster than for optimization. We then analyze
various polynomial-time sampling algorithms, including an extension of a recent
promising optimization approach, and find that they sometimes exhibit
interesting behavior but no near-optimal rates. Our results also give further
insights on the relation between sampling, log-partition, and optimization
problems.
Related papers
- Accelerated Stochastic Min-Max Optimization Based on Bias-corrected Momentum [30.01198677588252]
First-order algorithms require at least $mathcalO(varepsilonepsilon-4)$ complexity to find an $varepsilon-stationary point.
We introduce novel momentum algorithms utilizing efficient variable complexity.
The effectiveness of the method is validated through robust logistic regression using real-world datasets.
arXiv Detail & Related papers (2024-06-18T20:14:52Z) - Weighted least-squares approximation with determinantal point processes and generalized volume sampling [33.33724208084121]
We consider the problem of approximating a function from $L2$ by an element of a given $m$-dimensional space $V_m$.
We show that the approximation is almost surely bounded by the best approximation error measured in the $H$-norm.
arXiv Detail & Related papers (2023-12-21T17:34:18Z) - Deterministic Nonsmooth Nonconvex Optimization [94.01526844386977]
We show that randomization is necessary to obtain a dimension-free dimension-free algorithm.
Our algorithm yields the first deterministic dimension-free algorithm for optimizing ReLU networks.
arXiv Detail & Related papers (2023-02-16T13:57:19Z) - Near-Optimal Non-Convex Stochastic Optimization under Generalized
Smoothness [21.865728815935665]
Two recent works established the $O(epsilon-3)$ sample complexity to obtain an $O(epsilon)$-stationary point.
However, both require a large batch size on the order of $mathrmploy(epsilon-1)$, which is not only computationally burdensome but also unsuitable for streaming applications.
In this work, we solve the prior two problems simultaneously by revisiting a simple variant of the STORM algorithm.
arXiv Detail & Related papers (2023-02-13T00:22:28Z) - TURF: A Two-factor, Universal, Robust, Fast Distribution Learning
Algorithm [64.13217062232874]
One of its most powerful and successful modalities approximates every distribution to an $ell$ distance essentially at most a constant times larger than its closest $t$-piece degree-$d_$.
We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation.
arXiv Detail & Related papers (2022-02-15T03:49:28Z) - Bayesian Optimistic Optimisation with Exponentially Decaying Regret [58.02542541410322]
The current practical BO algorithms have regret bounds ranging from $mathcalO(fraclogNsqrtN)$ to $mathcal O(e-sqrtN)$, where $N$ is the number of evaluations.
This paper explores the possibility of improving the regret bound in the noiseless setting by intertwining concepts from BO and tree-based optimistic optimisation.
We propose the BOO algorithm, a first practical approach which can achieve an exponential regret bound with order $mathcal O(N-sqrt
arXiv Detail & Related papers (2021-05-10T13:07:44Z) - Finding Global Minima via Kernel Approximations [90.42048080064849]
We consider the global minimization of smooth functions based solely on function evaluations.
In this paper, we consider an approach that jointly models the function to approximate and finds a global minimum.
arXiv Detail & Related papers (2020-12-22T12:59:30Z) - Faster Differentially Private Samplers via R\'enyi Divergence Analysis
of Discretized Langevin MCMC [35.050135428062795]
Langevin dynamics-based algorithms offer much faster alternatives under some distance measures such as statistical distance.
Our techniques simple and generic and apply to underdamped Langevin dynamics.
arXiv Detail & Related papers (2020-10-27T22:52:45Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - Private Stochastic Convex Optimization: Optimal Rates in Linear Time [74.47681868973598]
We study the problem of minimizing the population loss given i.i.d. samples from a distribution over convex loss functions.
A recent work of Bassily et al. has established the optimal bound on the excess population loss achievable given $n$ samples.
We describe two new techniques for deriving convex optimization algorithms both achieving the optimal bound on excess loss and using $O(minn, n2/d)$ gradient computations.
arXiv Detail & Related papers (2020-05-10T19:52:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.