Convergence of Random Reshuffling Under The Kurdyka-{\L}ojasiewicz
Inequality
- URL: http://arxiv.org/abs/2110.04926v1
- Date: Sun, 10 Oct 2021 23:20:04 GMT
- Title: Convergence of Random Reshuffling Under The Kurdyka-{\L}ojasiewicz
Inequality
- Authors: Xiao Li, Andre Milzarek, and Junwen Qiu
- Abstract summary: We conduct convergence analysis for the non-descent RR with diminishing step sizes based on the Kurdyka-Lojasiewicz (KL) inequality.
We derive the corresponding rate of convergence depending on the KL exponent and the suitably selected diminishing step sizes.
We also establish similar strong limit-point convergence results for the proximal point method.
- Score: 3.960041987749073
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the random reshuffling (RR) method for smooth nonconvex optimization
problems with a finite-sum structure. Though this method is widely utilized in
practice such as the training of neural networks, its convergence behavior is
only understood in several limited settings. In this paper, under the
well-known Kurdyka-Lojasiewicz (KL) inequality, we establish strong limit-point
convergence results for RR with appropriate diminishing step sizes, namely, the
whole sequence of iterates generated by RR is convergent and converges to a
single stationary point in an almost sure sense. In addition, we derive the
corresponding rate of convergence, depending on the KL exponent and the
suitably selected diminishing step sizes. When the KL exponent lies in
$[0,\frac12]$, the convergence is at a rate of $\mathcal{O}(t^{-1})$ with $t$
counting the iteration number. When the KL exponent belongs to $(\frac12,1)$,
our derived convergence rate is of the form $\mathcal{O}(t^{-q})$ with $q\in
(0,1)$ depending on the KL exponent. The standard KL inequality-based
convergence analysis framework only applies to algorithms with a certain
descent property. Remarkably, we conduct convergence analysis for the
non-descent RR with diminishing step sizes based on the KL inequality, which
generalizes the standard KL analysis framework. We summarize our main steps and
core ideas in an analysis framework, which is of independent interest. As a
direct application of this framework, we also establish similar strong
limit-point convergence results for the shuffled proximal point method.
Related papers
- Convergence Rates for Stochastic Approximation: Biased Noise with Unbounded Variance, and Applications [2.0584253077707477]
We study the convergence properties of the Gradient Descent (SGD) method for finding a stationary point of a given objective function $J(cdot)$.
Our results apply to a class of invex'' functions, which have the property that every stationary point is also a global minimizer.
arXiv Detail & Related papers (2023-12-05T15:22:39Z) - A New Random Reshuffling Method for Nonsmooth Nonconvex Finite-sum Optimization [6.314057999212246]
Random reshuffling techniques are used in large-scale applications, such as neural networks.
In this paper, we show that the random reshuffling-type iterations generated by norm-PRR converge to a linear setting.
Finally, we derive last convergence rates that can be applied to the proposed approach.
arXiv Detail & Related papers (2023-12-02T07:12:00Z) - Compressed and distributed least-squares regression: convergence rates
with applications to Federated Learning [9.31522898261934]
We investigate the impact of compression on gradient algorithms for machine learning.
We highlight differences in terms of convergence rates between several unbiased compression operators.
We extend our results to the case of federated learning.
arXiv Detail & Related papers (2023-08-02T18:02:00Z) - On the Linear Convergence of Policy Gradient under Hadamard
Parameterization [4.182089296199263]
We study the convergence of deterministic policy gradient under the Hadamard parameterization.
We show that the error decreases at an $O(frac1k)$ rate for all the iterations.
arXiv Detail & Related papers (2023-05-31T05:51:15Z) - A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning [3.655021726150368]
In Multi-Agent Reinforcement Learning (MARL), multiple agents interact with a common environment, as also with each other, for solving a shared problem in sequential decision-making.
We derive a novel law of iterated for a family of distributed nonlinear approximation schemes that is useful in MARL.
arXiv Detail & Related papers (2021-10-27T08:01:17Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Proximal Gradient Descent-Ascent: Variable Convergence under K{\L}
Geometry [49.65455534654459]
The finite descent-ascent parameters (GDA) has been widely applied to solve minimax optimization problems.
This paper fills such a gap by studying the convergence of the KL-Lized geometry.
arXiv Detail & Related papers (2021-02-09T05:35:53Z) - On the Almost Sure Convergence of Stochastic Gradient Descent in
Non-Convex Problems [75.58134963501094]
This paper analyzes the trajectories of gradient descent (SGD)
We show that SGD avoids saddle points/manifolds with $1$ for strict step-size policies.
arXiv Detail & Related papers (2020-06-19T14:11:26Z) - The Convergence Indicator: Improved and completely characterized
parameter bounds for actual convergence of Particle Swarm Optimization [68.8204255655161]
We introduce a new convergence indicator that can be used to calculate whether the particles will finally converge to a single point or diverge.
Using this convergence indicator we provide the actual bounds completely characterizing parameter regions that lead to a converging swarm.
arXiv Detail & Related papers (2020-06-06T19:08:05Z) - A Simple Convergence Proof of Adam and Adagrad [74.24716715922759]
We show a proof of convergence between the Adam Adagrad and $O(d(N)/st)$ algorithms.
Adam converges with the same convergence $O(d(N)/st)$ when used with the default parameters.
arXiv Detail & Related papers (2020-03-05T01:56:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.