Optimal and Efficient Algorithms for Decentralized Online Convex Optimization
- URL: http://arxiv.org/abs/2402.09173v3
- Date: Wed, 11 Dec 2024 06:07:59 GMT
- Title: Optimal and Efficient Algorithms for Decentralized Online Convex Optimization
- Authors: Yuanyu Wan, Tong Wei, Bo Xue, Mingli Song, Lijun Zhang,
- Abstract summary: Decentralized online convex optimization (D-OCO) is designed to minimize a sequence of global loss functions using only local computations and communications.
We develop a novel D-OCO algorithm that can reduce the regret bounds for convex and strongly convex functions to $tildeO(nrho-1/4sqrtT)$ and $tildeO(nrho-1/2log T)$.
Our analysis reveals that the projection-free variant can achieve $O(nT3/4)$ and $O(n
- Score: 51.00357162913229
- License:
- Abstract: We investigate decentralized online convex optimization (D-OCO), in which a set of local learners are required to minimize a sequence of global loss functions using only local computations and communications. Previous studies have established $O(n^{5/4}\rho^{-1/2}\sqrt{T})$ and ${O}(n^{3/2}\rho^{-1}\log T)$ regret bounds for convex and strongly convex functions respectively, where $n$ is the number of local learners, $\rho<1$ is the spectral gap of the communication matrix, and $T$ is the time horizon. However, there exist large gaps from the existing lower bounds, i.e., $\Omega(n\sqrt{T})$ for convex functions and $\Omega(n)$ for strongly convex functions. To fill these gaps, in this paper, we first develop a novel D-OCO algorithm that can respectively reduce the regret bounds for convex and strongly convex functions to $\tilde{O}(n\rho^{-1/4}\sqrt{T})$ and $\tilde{O}(n\rho^{-1/2}\log T)$. The primary technique is to design an online accelerated gossip strategy that enjoys a faster average consensus among local learners. Furthermore, by carefully exploiting spectral properties of a specific network topology, we enhance the lower bounds for convex and strongly convex functions to $\Omega(n\rho^{-1/4}\sqrt{T})$ and $\Omega(n\rho^{-1/2}\log T)$, respectively. These results suggest that the regret of our algorithm is nearly optimal in terms of $T$, $n$, and $\rho$ for both convex and strongly convex functions. Finally, we propose a projection-free variant of our algorithm to efficiently handle practical applications with complex constraints. Our analysis reveals that the projection-free variant can achieve ${O}(nT^{3/4})$ and ${O}(nT^{2/3}(\log T)^{1/3})$ regret bounds for convex and strongly convex functions with nearly optimal $\tilde{O}(\rho^{-1/2}\sqrt{T})$ and $\tilde{O}(\rho^{-1/2}T^{1/3}(\log T)^{2/3})$ communication rounds, respectively.
Related papers
- Two-Timescale Gradient Descent Ascent Algorithms for Nonconvex Minimax Optimization [77.3396841985172]
We provide a unified analysis of two-timescale gradient ascent (TTGDA) for solving structured non minimax optimization problems.
Our contribution is to design TTGDA algorithms are effective beyond the setting.
arXiv Detail & Related papers (2024-08-21T20:14:54Z) - Online Newton Method for Bandit Convex Optimisation [28.66596225688161]
We introduce a computationally efficient algorithm for zeroth-order bandit convex optimisation.
We prove that in the adversarial setting its regret is at most $d3.5 sqrtn mathrmpolylog(n, d)$ with high probability where $d$ is the time horizon.
In the setting the bound improves to $M d2 sqrtn mathrmpolylog(n, d)$ where $M in [d-1/2, d-1 / 4]$ is
arXiv Detail & Related papers (2024-06-10T17:44:11Z) - Fast Rates in Stochastic Online Convex Optimization by Exploiting the Curvature of Feasible Sets [35.8717656676532]
In online linear optimization, it is known that if the average gradient of loss functions exceeds a certain threshold, the curvature of feasible sets can be exploited.
This study reveals that algorithms adaptive to the curvature of loss functions can also leverage the curvature of feasible sets.
arXiv Detail & Related papers (2024-02-20T09:59:33Z) - ReSQueing Parallel and Private Stochastic Convex Optimization [59.53297063174519]
We introduce a new tool for BFG convex optimization (SCO): a Reweighted Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density.
We develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings.
arXiv Detail & Related papers (2023-01-01T18:51:29Z) - On the Complexity of Decentralized Smooth Nonconvex Finite-Sum Optimization [21.334985032433778]
Decentralized optimization problem $min_bf xinmathbb Rd f(bf x)triq frac1msum_i=1m f_i(bf x)triq frac1nsum_j=1n.
arXiv Detail & Related papers (2022-10-25T11:37:11Z) - Decentralized Stochastic Variance Reduced Extragradient Method [25.21457349137344]
This paper studies decentralized convex-concave minimax optimization problems of the form $min_xmax_y fx,y triqfrac1msumi=1m f_i triqfrac1msumi=1m f_i triqfrac1msumi=1m f_i triqfrac1msumi=1m f_i triqfrac1msum
arXiv Detail & Related papers (2022-02-01T16:06:20Z) - Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$
Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$
We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z) - Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure.
We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds.
We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.