Simulated annealing from continuum to discretization: a convergence
analysis via the Eyring--Kramers law
- URL: http://arxiv.org/abs/2102.02339v1
- Date: Wed, 3 Feb 2021 23:45:39 GMT
- Title: Simulated annealing from continuum to discretization: a convergence
analysis via the Eyring--Kramers law
- Authors: Wenpin Tang and Xun Yu Zhou
- Abstract summary: We study the convergence rate of continuous-time simulated annealing $(X_t;, t ge 0)$ and its discretization $(x_k;, k =0,1, ldots)$
We prove that the tail probability $mathbbP(f(X_t) > min f +delta)$ (resp. $mathP(f(x_k) > min f +delta)$) decays in time (resp. in cumulative step size)
- Score: 10.406659081400354
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the convergence rate of continuous-time simulated annealing $(X_t;
\, t \ge 0)$ and its discretization $(x_k; \, k =0,1, \ldots)$ for
approximating the global optimum of a given function $f$. We prove that the
tail probability $\mathbb{P}(f(X_t) > \min f +\delta)$ (resp.
$\mathbb{P}(f(x_k) > \min f +\delta)$) decays polynomial in time (resp. in
cumulative step size), and provide an explicit rate as a function of the model
parameters. Our argument applies the recent development on functional
inequalities for the Gibbs measure at low temperatures -- the Eyring-Kramers
law. In the discrete setting, we obtain a condition on the step size to ensure
the convergence.
Related papers
- Further Understanding of a Local Gaussian Process Approximation: Characterising Convergence in the Finite Regime [1.3518297878940662]
We show that common choices of kernel functions for a highly accurate and massively scalable GPnn regression model exhibit gradual convergence to behaviour as dataset-size $n$ increases.
Similar bounds can be found under model misspecification and combined to give overall rates of convergence of both MSE and an important calibration metric.
arXiv Detail & Related papers (2024-04-09T10:47:01Z) - On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
This paper considers the RMSProp and its momentum extension and establishes the convergence rate of $frac1Tsum_k=1T.
Our convergence rate matches the lower bound with respect to all the coefficients except the dimension $d$.
Our convergence rate can be considered to be analogous to the $frac1Tsum_k=1T.
arXiv Detail & Related papers (2024-02-01T07:21:32Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - An Explicit Expansion of the Kullback-Leibler Divergence along its
Fisher-Rao Gradient Flow [8.052709336750823]
We show that when $pirhollback$ exhibits multiple modes, $pirhollback$ is that textitindependent of the potential function.
We provide an explicit expansion of $textKL.
KL.
KL.
KL.
KL.
KL.
KL.
KL.
KL.
KL.
KL.
KL.
KL.
arXiv Detail & Related papers (2023-02-23T18:47:54Z) - ReSQueing Parallel and Private Stochastic Convex Optimization [59.53297063174519]
We introduce a new tool for BFG convex optimization (SCO): a Reweighted Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density.
We develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings.
arXiv Detail & Related papers (2023-01-01T18:51:29Z) - On quantitative Laplace-type convergence results for some exponential
probability measures, with two applications [2.9189409618561966]
We find a limit of the sequence of measures $(pi_varepsilon)_varepsilon >0$ with density w.r.t the Lebesgue measure $(mathrmd pi_varepsilon)_varepsilon >0$ with density w.r.t the Lebesgue measure $(mathrmd pi_varepsilon)_varepsilon >0$ with density w.r.t the Lebesgue measure $(mathrmd
arXiv Detail & Related papers (2021-10-25T13:00:25Z) - Finding Global Minima via Kernel Approximations [90.42048080064849]
We consider the global minimization of smooth functions based solely on function evaluations.
In this paper, we consider an approach that jointly models the function to approximate and finds a global minimum.
arXiv Detail & Related papers (2020-12-22T12:59:30Z) - On Convergence of Gradient Expected Sarsa($\lambda$) [25.983112169543375]
We show that applying the off-line estimate (multi-step bootstrapping) to $mathttExpectedSarsa(lambda)$ is unstable for off-policy learning.
We propose a convergent $mathttGradientExpectedSarsa(lambda)$ ($mathttGES(lambda)$) algorithm.
arXiv Detail & Related papers (2020-12-14T01:27:24Z) - Tight Nonparametric Convergence Rates for Stochastic Gradient Descent
under the Noiseless Linear Model [0.0]
We analyze the convergence of single-pass, fixed step-size gradient descent on the least-square risk under this model.
As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points.
arXiv Detail & Related papers (2020-06-15T08:25:50Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions.
In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.