Related papers: Convergence of Sign-based Random Reshuffling Algorithms for Nonconvex Optimization

Convergence of Sign-based Random Reshuffling Algorithms for Nonconvex Optimization

URL: http://arxiv.org/abs/2310.15976v2
Date: Thu, 28 Dec 2023 00:59:10 GMT
Title: Convergence of Sign-based Random Reshuffling Algorithms for Nonconvex Optimization
Authors: Zhen Qin, Zhishuai Liu, Pan Xu
Abstract summary: Existing analyses of signSGD rely on assuming that data are with replacement in each iteration. We show that Sign has the same $O(log(nT)/stnT + log (nT)sqrtn/sT)$. We back up theoretical findings through experiments on simulated real-world problems.
Score: 14.69450310695133
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: signSGD is popular in nonconvex optimization due to its communication efficiency. Yet, existing analyses of signSGD rely on assuming that data are sampled with replacement in each iteration, contradicting the practical implementation where data are randomly reshuffled and sequentially fed into the algorithm. We bridge this gap by proving the first convergence result of signSGD with random reshuffling (SignRR) for nonconvex optimization. Given the dataset size $n$, the number of epochs of data passes $T$, and the variance bound of a stochastic gradient $\sigma^2$, we show that SignRR has the same convergence rate $O(\log(nT)/\sqrt{nT} + \|\sigma\|_1)$ as signSGD \citep{bernstein2018signsgd}. We then present SignRVR and SignRVM, which leverage variance-reduced gradients and momentum updates respectively, both converging at $O(\log (nT)/\sqrt{nT} + \log (nT)\sqrt{n}/\sqrt{T})$. In contrast with the analysis of signSGD, our results do not require an extremely large batch size in each iteration to be of the same order as the total number of iterations \citep{bernstein2018signsgd} or the signs of stochastic and true gradients match element-wise with a minimum probability of 1/2 \citep{safaryan2021stochastic}. We also extend our algorithms to cases where data are distributed across different machines, yielding dist-SignRVR and dist-SignRVM, both converging at $O(\log (n_0T)/\sqrt{n_0T} + \log (n_0T)\sqrt{n_0}/\sqrt{T})$, where $n_0$ is the dataset size of a single machine. We back up our theoretical findings through experiments on simulated and real-world problems, verifying that randomly reshuffled sign methods match or surpass existing baselines.

Related papers

Variable Selection in Convex Piecewise Linear Regression [5.366354612549172]
This paper presents Sparse Gradient as a solution for variable selection in convex piecewise linear regression. A non-asymptotic local convergence analysis is provided for SpGD under subGaussian noise.
arXiv Detail & Related papers (2024-11-04T16:19:09Z)
Inverting the Leverage Score Gradient: An Efficient Approximate Newton Method [10.742859956268655]
This paper aims to recover the intrinsic model parameters given the leverage scores gradient. We specifically scrutinize the inversion of the leverage score gradient, denoted as $g(x)$.
arXiv Detail & Related papers (2024-08-21T01:39:42Z)
On the Convergence of a Federated Expectation-Maximization Algorithm [0.0]
We study the convergence rate of the Expectation-Maximization (EM) algorithm for the Federated Mixture of $K$ Linear Regressions model. Surprisingly, the results show that rather than being a bottleneck, data heterogeneity can accelerate the convergence of Federated Learning algorithms.
arXiv Detail & Related papers (2024-08-11T16:46:42Z)
Efficient Sign-Based Optimization: Accelerating Convergence via Variance Reduction [16.82220229840038]
We introduce two novel algorithms that attain improved convergence rates of $mathcalO(d1/2T-1/2 + dn-1/2)$ and $mathcalO(d1/4T-1/4)$ respectively. Numerical experiments across different tasks validate the effectiveness of our proposed methods.
arXiv Detail & Related papers (2024-06-01T16:38:43Z)
Mirror Descent Algorithms with Nearly Dimension-Independent Rates for Differentially-Private Stochastic Saddle-Point Problems [6.431793114484429]
We propose $sqrtlog(d)/sqrtn + log(d)/[nvarepsilon]2/5$ to solve the problem of differentially-private saddle-points in the polyhedral setting. We show that our algorithms attain a rate of $sqrtlog(d)/sqrtn + log(d)/[nvarepsilon]2/5$ with constant success.
arXiv Detail & Related papers (2024-03-05T12:28:00Z)
ReSQueing Parallel and Private Stochastic Convex Optimization [59.53297063174519]
We introduce a new tool for BFG convex optimization (SCO): a Reweighted Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. We develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings.
arXiv Detail & Related papers (2023-01-01T18:51:29Z)
Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning [77.22019100456595]
We show a training algorithm for distributed computation workers with varying communication frequency. In this work, we obtain a tighter convergence rate of $mathcalO!!!(sigma2-2_avg!! . We also show that the heterogeneity term in rate is affected by the average delay within each worker.
arXiv Detail & Related papers (2022-06-16T17:10:57Z)
Distributed Saddle-Point Problems Under Similarity [173.19083235638104]
We show that a given suboptimality $epsilon0$ is achieved master/workers networks in $Omegabig. We then propose algorithms matching the lower bounds either types of networks (up to log-overs) We assess effectiveness of the proposed algorithms on a robust logistic regression problem.
arXiv Detail & Related papers (2021-07-22T14:25:16Z)
Hybrid Stochastic-Deterministic Minibatch Proximal Gradient: Less-Than-Single-Pass Optimization with Nearly Optimal Generalization [83.80460802169999]
We show that HSDMPG can attain an $mathcalObig (1/sttnbig)$ which is at the order of excess error on a learning model. For loss factors, we prove that HSDMPG can attain an $mathcalObig (1/sttnbig)$ which is at the order of excess error on a learning model.
arXiv Detail & Related papers (2020-09-18T02:18:44Z)
On the Minimax Optimality of the EM Algorithm for Learning Two-Component Mixed Linear Regression [30.09238179576628]
We study the convergence rates of the EM algorithm for learning two-component mixed linear regression under all regimes of signal-to-noise ratio (SNR) We show that the EM algorithm achieves minimax optimal sample complexity under all SNR regimes.
arXiv Detail & Related papers (2020-06-04T00:56:07Z)
Non-asymptotic Convergence of Adam-type Reinforcement Learning Algorithms under Markovian Sampling [56.394284787780364]
This paper provides the first theoretical convergence analysis for two fundamental RL algorithms of policy gradient (PG) and temporal difference (TD) learning. Under general nonlinear function approximation, PG-AMSGrad with a constant stepsize converges to a neighborhood of a stationary point at the rate of $mathcalO(log T/sqrtT)$. Under linear function approximation, TD-AMSGrad with a constant stepsize converges to a neighborhood of the global optimum at the rate of $mathcalO(log T/sqrtT
arXiv Detail & Related papers (2020-02-15T00:26:49Z)
Tight Regret Bounds for Noisy Optimization of a Brownian Motion [118.65407541895851]
We consider the problem of Bayesian optimization of a one-dimensional Brownian motion in which the $T$ adaptively chosen observations are corrupted by Gaussian noise. We show that as the smallest possible expected cumulative regret and the smallest possible expected simple regret scale as $Omega(sigmasqrtT cdot log T)$.
arXiv Detail & Related papers (2020-01-25T14:44:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.