Related papers: CEDAS: A Compressed Decentralized Stochastic Gradient Method with Improved Convergence

CEDAS: A Compressed Decentralized Stochastic Gradient Method with Improved Convergence

URL: http://arxiv.org/abs/2301.05872v2
Date: Mon, 26 Feb 2024 01:26:05 GMT
Title: CEDAS: A Compressed Decentralized Stochastic Gradient Method with Improved Convergence
Authors: Kun Huang and Shi Pu
Abstract summary: In this paper, we consider solving the distributed optimization problem of a multi-agent network under the communication restricted setting. We show the method compressed exact diffusion termed convexizes (CEDAS)", and show the method achieves adaptive steps for both smooth convex-related steps.
Score: 10.770843226843418
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we consider solving the distributed optimization problem over a multi-agent network under the communication restricted setting. We study a compressed decentralized stochastic gradient method, termed ``compressed exact diffusion with adaptive stepsizes (CEDAS)", and show the method asymptotically achieves comparable convergence rate as centralized { stochastic gradient descent (SGD)} for both smooth strongly convex objective functions and smooth nonconvex objective functions under unbiased compression operators. In particular, to our knowledge, CEDAS enjoys so far the shortest transient time (with respect to the graph specifics) for achieving the convergence rate of centralized SGD, which behaves as $\mathcal{O}(n{C^3}/(1-\lambda_2)^{2})$ under smooth strongly convex objective functions, and $\mathcal{O}(n^3{C^6}/(1-\lambda_2)^4)$ under smooth nonconvex objective functions, where $(1-\lambda_2)$ denotes the spectral gap of the mixing matrix, and $C>0$ is the compression-related parameter. Numerical experiments further demonstrate the effectiveness of the proposed algorithm.

Related papers

Theoretical Framework for Tempered Fractional Gradient Descent: Application to Breast Cancer Classification [0.0]
This paper introduces Fractional Gradient Descent (TFGD), a novel optimization framework that synergizes fractional calculus with exponential tempering to enhance gradient-based learning. TFGD addresses limitations by incorporating a tempered memory mechanism, where historical gradients are weighted by fractional coefficients $|w_j| = binomalphaj$ and exponentially decayed via a tempering parameter $lambda$. Empirical validation on the Breast Cancer Wisconsin dataset demonstrates TFGD's superiority, achieving 98.25% test accuracy (vs. 92.11% for SGD) and 2$times$ faster convergence.
arXiv Detail & Related papers (2025-04-26T08:26:34Z)
Enhancing Convergence of Decentralized Gradient Tracking under the KL Property [10.925931212031692]
We establish convergence of the same type for the notorious decentralized gradient-tracking-based algorithm SONATA.<n>This matches the convergence behavior of centralized-gradient algorithms except when.<n>thetain (1/2,1)$, sub rate is certified; and.<n>textbf(iii)$ when $thetain (1/2,1)$, sub rate is certified; and.<n>textbf(iii)$ when $thetain (1/2,1)$, sub rate is certified.
arXiv Detail & Related papers (2024-12-12T18:44:36Z)
Two-Timescale Gradient Descent Ascent Algorithms for Nonconvex Minimax Optimization [77.3396841985172]
We provide a unified analysis of two-timescale gradient ascent (TTGDA) for solving structured non minimax optimization problems.<n>Our contribution is to design TTGDA algorithms are effective beyond the setting.
arXiv Detail & Related papers (2024-08-21T20:14:54Z)
Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework. Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z)
Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density. We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order. We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z)
On Convergence of Incremental Gradient for Non-Convex Smooth Functions [63.51187646914962]
In machine learning and network optimization, algorithms like shuffle SGD are popular due to minimizing the number of misses and good cache. This paper delves into the convergence properties SGD algorithms with arbitrary data ordering.
arXiv Detail & Related papers (2023-05-30T17:47:27Z)
Zero-Order One-Point Estimate with Distributed Stochastic Gradient-Tracking Technique [23.63073074337495]
We consider a distributed multi-agent optimization problem where each agent holds a local objective function that is smooth and convex. We extend the distributed gradient-tracking method to the bandit setting where we don't have an estimate of the gradient. We analyze the convergence of this novel technique for smooth and convex objectives using approximation tools.
arXiv Detail & Related papers (2022-10-11T17:04:45Z)
Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$. We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z)
Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning [77.22019100456595]
We show a training algorithm for distributed computation workers with varying communication frequency. In this work, we obtain a tighter convergence rate of $mathcalO!!!(sigma2-2_avg!! . We also show that the heterogeneity term in rate is affected by the average delay within each worker.
arXiv Detail & Related papers (2022-06-16T17:10:57Z)
DoCoM: Compressed Decentralized Optimization with Near-Optimal Sample Complexity [25.775517797956237]
This paper proposes the Doubly Compressed Momentum-assisted tracking algorithm $ttDoCoM$ for communication. We show that our algorithm outperforms several state-of-the-art algorithms in practice.
arXiv Detail & Related papers (2022-02-01T07:27:34Z)
Mean-Square Analysis with An Application to Optimal Dimension Dependence of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance. Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z)
Improving the Transient Times for Distributed Stochastic Gradient Methods [5.215491794707911]
We study a distributed gradient algorithm, called exact diffusion adaptive stepsizes (EDAS) We show EDAS achieves the same network independent convergence rate as centralized gradient descent (SGD) To the best of our knowledge, EDAS achieves the shortest time when the average of the $n$ cost functions is strongly convex.
arXiv Detail & Related papers (2021-05-11T08:09:31Z)
A Variance Controlled Stochastic Method with Biased Estimation for Faster Non-convex Optimization [0.0]
We propose a new technique, em variance controlled gradient (VCSG), to improve the performance of the reduced gradient (SVRG) $lambda$ is introduced in VCSG to avoid over-reducing a variance by SVRG. $mathcalO(min1/epsilon3/2,n1/4/epsilon)$ number of gradient evaluations, which improves the leading gradient complexity.
arXiv Detail & Related papers (2021-02-19T12:22:56Z)
S-ADDOPT: Decentralized stochastic first-order optimization over directed graphs [16.96562173221624]
Decentralized convex optimization is proposed to minimize a sum of smooth and strongly cost functions when the functions are distributed over a directed network nodes. In particular, we propose thetextbftextttS-ADDOPT algorithm that assumes a first-order oracle at each node. For decaying step-sizes$mathcalO (1/k)$, we show thattextbfttS-ADDOPT reaches the exact solution subly at$mathcalO (1/k)$ and its convergence is networkally-independent
arXiv Detail & Related papers (2020-05-15T21:14:22Z)
Stochastic gradient-free descents [8.663453034925363]
We propose gradient-free methods and accelerated gradients with momentum for solving optimization problems. We analyze the convergence behavior of these methods under the mean-variance framework.
arXiv Detail & Related papers (2019-12-31T13:56:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.