Related papers: Pareto-optimal Tradeoffs Between Communication and Computation with Flexible Gradient Tracking

Pareto-optimal Tradeoffs Between Communication and Computation with Flexible Gradient Tracking

URL: http://arxiv.org/abs/2509.18129v1
Date: Thu, 11 Sep 2025 21:47:24 GMT
Title: Pareto-optimal Tradeoffs Between Communication and Computation with Flexible Gradient Tracking
Authors: Yan Huang, Jinming Xu, Li Chai, Jiming Chen, Karl H. Johansson,
Abstract summary: This paper addresses distributed optimization problems in non-rho convexi.i.d. scenarios.<n>We propose FlexFlexGT, a flexible snapshot tracking method with numbers of local updates in each iteration.<n>We introduce an accelerated gossip-based variant termed Acc-GT, and show that it achieves a tunable-optimal trade-off between communication and computation.
Score: 20.36100343422038
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper addresses distributed optimization problems in non-i.i.d. scenarios, focusing on the interplay between communication and computation efficiency. To this end, we propose FlexGT, a flexible snapshot gradient tracking method with tunable numbers of local updates and neighboring communications in each round. Leveraging a unified convergence analysis framework, we prove that FlexGT achieves a linear or sublinear convergence rate depending on objective-specific properties--from (strongly) convex to nonconvex--and the above-mentioned tunable parameters. FlexGT is provably robust to the heterogeneity across nodes and attains the best-known communication and computation complexity among existing results. Moreover, we introduce an accelerated gossip-based variant, termed Acc-FlexGT, and show that with prior knowledge of the graph, it achieves a Pareto-optimal trade-off between communication and computation. Particularly, Acc-FlexGT achieves the optimal iteration complexity of $\tilde{\mathcal{O}} \left( L/\epsilon +L\sigma ^2/\left( n\epsilon^2 \sqrt{1-\sqrt{\rho _W}} \right) \right) $ for the nonconvex case, matching the existing lower bound up to a logarithmic factor, and improves the existing results for the strongly convex case by a factor of $\tilde{\mathcal{O}} \left( 1/\sqrt{\epsilon} \right)$, where $\epsilon$ is the targeted accuracy, $n$ the number of nodes, $L$ the Lipschitz constant, $\rho_W$ the spectrum gap of the graph, and $\sigma$ the stochastic gradient variance. Numerical examples are provided to demonstrate the effectiveness of the proposed methods.

Related papers

Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture.<n>Non-smooth regularization is often incorporated into machine learning tasks.<n>We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
Distributed Learning over Arbitrary Topology: Linear Speed-Up with Polynomial Transient Time [3.1789549088190414]
We study a distributed learning problem in which $n$ agents, each with potentially heterogeneous local data, collaboratively share the sum of their local cost functions via peer-to-peer communication.<n>We propose a novel, emph Tree PushPull- (STPP), which employs two trees extracted from a general communication graph to distribute both model parameters and topological parameters.
arXiv Detail & Related papers (2025-03-20T13:11:44Z)
Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency [52.60557300927007]
We present a $textbfMA-OSMA$ algorithm to transfer the discrete submodular problem into a continuous optimization.<n>We also introduce a projection-free $textbfMA-OSEA$ algorithm, which effectively utilizes the KL divergence by mixing a uniform distribution.<n>Our algorithms significantly improve the $(frac11+c)$-approximation provided by the state-of-the-art OSG algorithm.
arXiv Detail & Related papers (2025-02-07T15:57:56Z)
Two-Timescale Gradient Descent Ascent Algorithms for Nonconvex Minimax Optimization [77.3396841985172]
We provide a unified analysis of two-timescale gradient ascent (TTGDA) for solving structured non minimax optimization problems.<n>Our contribution is to design TTGDA algorithms are effective beyond the setting.
arXiv Detail & Related papers (2024-08-21T20:14:54Z)
Byzantine-resilient Federated Learning Employing Normalized Gradients on Non-IID Datasets [23.640506243685863]
In practical federated learning (FLNGA) the presence of malicious attacks and data heterogeneity often introduces biases into the learning process. We propose a Normalized Gradient unit (Fed-M) model which normalizes uploaded local gradients to be before aggregation, achieving a time of $mathcalO(pM)$.
arXiv Detail & Related papers (2024-08-18T16:50:39Z)
Stochastic Constrained Decentralized Optimization for Machine Learning with Fewer Data Oracles: a Gradient Sliding Approach [32.36073823372713]
In machine-learning models, the algorithm has to communicate to the data center and sample data for its gradient. This gives rise to the need for a decentralized optimization algorithm that is communication-efficient and minimizes the number of gradient computations. We propose a primal-dual sliding with conditional gradient sliding framework, which is communication-efficient and achieves an $varepsilon$-approximate solution.
arXiv Detail & Related papers (2024-04-03T06:55:59Z)
High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise [59.25598762373543]
We show that wetailed high-prob convergence guarantees of learning on streaming data in the presence of heavy-tailed noise. We demonstrate analytically and that $ta$ can be used to the preferred choice of setting for a given problem.
arXiv Detail & Related papers (2023-10-28T18:53:41Z)
Achieving Linear Speedup with ProxSkip in Distributed Stochastic Optimization [50.28561300894826]
ProxSkip is gaining increasing attention due to its proven benefits in accelerating communication while maintaining robustness against data.<n>We show that ProxSkip achieves linear speed and can reduce communication overhead proportional to the probability of communication.
arXiv Detail & Related papers (2023-10-12T02:13:48Z)
Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach [57.92727189589498]
We propose an online convex optimization approach with two different levels of adaptivity. We obtain $mathcalO(log V_T)$, $mathcalO(d log V_T)$ and $hatmathcalO(sqrtV_T)$ regret bounds for strongly convex, exp-concave and convex loss functions.
arXiv Detail & Related papers (2023-07-17T09:55:35Z)
Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation Constrained Optimization [88.0031283949404]
Many real-world problems have complicated non functional constraints and use a large number of data points. Our proposed method outperforms an existing method with the previously best-known result.
arXiv Detail & Related papers (2022-12-19T14:48:54Z)
Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$. We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z)
Improving the Transient Times for Distributed Stochastic Gradient Methods [5.215491794707911]
We study a distributed gradient algorithm, called exact diffusion adaptive stepsizes (EDAS) We show EDAS achieves the same network independent convergence rate as centralized gradient descent (SGD) To the best of our knowledge, EDAS achieves the shortest time when the average of the $n$ cost functions is strongly convex.
arXiv Detail & Related papers (2021-05-11T08:09:31Z)
On the Benefits of Multiple Gossip Steps in Communication-Constrained Decentralized Optimization [29.42301299741866]
We show that having $O(logfrac1epsilon)$ iterations with constant step size - $O(logfrac1epsilon)$ - enables convergence to within $epsilon$ of the optimal value for smooth non- compressed gradient objectives. To our knowledge, this is the first work that derives the convergence results for non optimization under compressed communication compression parameters.
arXiv Detail & Related papers (2020-11-20T21:17:32Z)
$p$-Norm Flow Diffusion for Local Graph Clustering [18.90840167452118]
We present a family of convex optimization formulations based on the idea of diffusion with p-norm network flow for $pin (1,infty)$. In the context of local clustering, we show their usefulness in finding low conductance cuts around input seed set. We show that the proposed problem can be solved in strongly local running time for $pge 2$ and conduct empirical evaluations on both synthetic and real-world graphs to illustrate our approach compares favorably with existing methods.
arXiv Detail & Related papers (2020-05-20T01:08:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.