Related papers: Uniform Generalization Bound on Time and Inverse Temperature for Gradient Descent Algorithm and its Application to Analysis of Simulated Annealing

Uniform Generalization Bound on Time and Inverse Temperature for Gradient Descent Algorithm and its Application to Analysis of Simulated Annealing

URL: http://arxiv.org/abs/2205.12959v1
Date: Wed, 25 May 2022 01:21:27 GMT
Title: Uniform Generalization Bound on Time and Inverse Temperature for Gradient Descent Algorithm and its Application to Analysis of Simulated Annealing
Authors: Keisuke Suzuki
Abstract summary: We propose a novel uniform generalization bound on the time and inverse temperature for gradient Langevin dynamics. As an application generalization, an evaluation on the effectiveness is presented.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose a novel uniform generalization bound on the time and inverse temperature for stochastic gradient Langevin dynamics (SGLD) in a non-convex setting. While previous works derive their generalization bounds by uniform stability, we use Rademacher complexity to make our generalization bound independent of the time and inverse temperature. Using Rademacher complexity, we can reduce the problem to derive a generalization bound on the whole space to that on a bounded region and therefore can remove the effect of the time and inverse temperature from our generalization bound. As an application of our generalization bound, an evaluation on the effectiveness of the simulated annealing in a non-convex setting is also described. For the sample size $n$ and time $s$, we derive evaluations with orders $\sqrt{n^{-1} \log (n+1)}$ and $|(\log)^4(s)|^{-1}$, respectively. Here, $(\log)^4$ denotes the $4$ times composition of the logarithmic function.

Related papers

Convergence of Momentum-Based Optimization Algorithms with Time-Varying Parameters [0.7614628596146599]
We present a unified algorithm for optimization that makes use of a "momentum" term.<n>The gradient depends not only on the current true gradient of the objective function, but also on the true gradient at the previous iteration.
arXiv Detail & Related papers (2025-06-13T15:53:17Z)
Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson-Romberg Extrapolation [22.652143194356864]
We address the problem of solving strongly convex and smooth problems using a descent gradient (SGD) algorithm with a constant step size. We provide an expansion of the mean-squared error of the resulting estimator with respect to the number iterations of $n$. We establish that this chain is geometrically ergodic with respect to a defined weighted Wasserstein semimetric.
arXiv Detail & Related papers (2024-10-07T15:02:48Z)
Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent [63.43247232708004]
A gradient descent performed in an asynchronous manner plays a crucial role in training large-scale machine learning models. Existing generalization error bounds are rather pessimistic and cannot reveal the correlation between asynchronous delays and generalization. Our theoretical results indicate that asynchronous delays reduce the generalization error of the delayed SGD algorithm.
arXiv Detail & Related papers (2023-08-18T10:00:27Z)
Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems. In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z)
Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm. We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z)
Mean-Square Analysis with An Application to Optimal Dimension Dependence of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance. Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z)
High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails. We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z)
Simulated annealing from continuum to discretization: a convergence analysis via the Eyring--Kramers law [10.406659081400354]
We study the convergence rate of continuous-time simulated annealing $(X_t;, t ge 0)$ and its discretization $(x_k;, k =0,1, ldots)$ We prove that the tail probability $mathbbP(f(X_t) > min f +delta)$ (resp. $mathP(f(x_k) > min f +delta)$) decays in time (resp. in cumulative step size)
arXiv Detail & Related papers (2021-02-03T23:45:39Z)
Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave. At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z)
Learning Complexity of Simulated Annealing [23.678288517578764]
We study the criteria under which the temperature changes namely, the cooling schedule. We provide positive results both in terms of sample complexity and simulation complexity. We present time algorithms with provable guarantees for the learning problem.
arXiv Detail & Related papers (2020-03-06T00:49:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.