Related papers: Distributed Learning over Arbitrary Topology: Linear Speed-Up with Polynomial Transient Time

Distributed Learning over Arbitrary Topology: Linear Speed-Up with Polynomial Transient Time

URL: http://arxiv.org/abs/2503.16123v1
Date: Thu, 20 Mar 2025 13:11:44 GMT
Title: Distributed Learning over Arbitrary Topology: Linear Speed-Up with Polynomial Transient Time
Authors: Runze You, Shi Pu,
Abstract summary: We study a distributed learning problem in which $n$ agents, each with potentially heterogeneous local data, collaboratively collaborate the sum of their local cost functions via peer-to-peer communication.<n>We propose a novel, Spanning Tree Push-Pull (STPP), which employs two trees extracted from a general communication graph to distribute both model parameters and parameters.
Score: 3.1789549088190414
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We study a distributed learning problem in which $n$ agents, each with potentially heterogeneous local data, collaboratively minimize the sum of their local cost functions via peer-to-peer communication. We propose a novel algorithm, Spanning Tree Push-Pull (STPP), which employs two spanning trees extracted from a general communication graph to distribute both model parameters and stochastic gradients. Unlike prior approaches that rely heavily on spectral gap properties, STPP leverages a more flexible topological characterization, enabling robust information flow and efficient updates. Theoretically, we prove that STPP achieves linear speedup and polynomial transient iteration complexity, up to $O(n^7)$ for smooth nonconvex objectives and $\tilde{O}(n^3)$ for smooth strongly convex objectives, under arbitrary network topologies. Moreover, compared with the existing methods, STPP achieves faster convergence rates on sparse and non-regular topologies (e.g., directed ring) and reduces communication overhead on dense networks (e.g., static exponential graph). These results significantly advance the state of the art, especially when $n$ is large. Numerical experiments further demonstrate the strong performance of STPP and confirm the practical relevance of its theoretical convergence rates across various common graph architectures. Our code is available at https://anonymous.4open.science/r/SpanningTreePushPull-5D3E.

Related papers

Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial Time [45.72323731094864]
In this paper, we study the optimality gap between two-layer ReLULU networks regularized with weight decay and their convex relaxations. Our study sheds new light on understanding why local methods work well.
arXiv Detail & Related papers (2024-02-06T01:29:35Z)
Epidemic Learning: Boosting Decentralized Learning with Randomized Communication [8.685687713892193]
We present Epidemic Learning (EL), a simple yet powerful decentralized learning (DL) algorithm. EL converges up to $ 1.7times$ quicker than baseline DL algorithms and attains $2.2 $% higher accuracy for the same communication volume. Our results illustrate that EL converges up to $ 1.7times$ quicker than baseline DL algorithms and attains $2.2 $% higher accuracy for the same communication volume.
arXiv Detail & Related papers (2023-10-03T11:28:54Z)
Topological Graph Signal Compression [7.836468889756101]
We propose a novel TDL-based method for compressing signals over graphs. Our results show that our framework improves both standard GNN and feed-forward architectures.
arXiv Detail & Related papers (2023-08-21T22:26:21Z)
Beyond Exponential Graph: Communication-Efficient Topologies for Decentralized Learning via Finite-time Convergence [31.933103173481964]
We propose a novel topology combining both a fast consensus rate and small maximum degree. The Base-$(k + 1)$ Graph endows Decentralized SGD (DSGD) with both a faster convergence rate and more communication efficiency than the exponential graph.
arXiv Detail & Related papers (2023-05-19T04:08:07Z)
Efficient Graph Field Integrators Meet Point Clouds [59.27295475120132]
We present two new classes of algorithms for efficient field integration on graphs encoding point clouds. The first class, SeparatorFactorization(SF), leverages the bounded genus of point cloud mesh graphs, while the second class, RFDiffusion(RFD), uses popular epsilon-nearest-neighbor graph representations for point clouds.
arXiv Detail & Related papers (2023-02-02T08:33:36Z)
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching. Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z)
Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training. We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z)
Optimal Propagation for Graph Neural Networks [51.08426265813481]
We propose a bi-level optimization approach for learning the optimal graph structure. We also explore a low-rank approximation model for further reducing the time complexity.
arXiv Detail & Related papers (2022-05-06T03:37:00Z)
Decentralized Sparse Linear Regression via Gradient-Tracking: Linear Convergence and Statistical Guarantees [23.256961881716595]
We study a sparse linear regression over a network of agents, modeled as an undirected graph and no server node.<n>We analyze the convergence rate and statistical guarantees of a distributed projected gradient tracking-based algorithm.
arXiv Detail & Related papers (2022-01-21T01:26:08Z)
Distributed Sparse Feature Selection in Communication-Restricted Networks [6.9257380648471765]
We propose and theoretically analyze a new distributed scheme for sparse linear regression and feature selection. In order to infer the causal dimensions from the whole dataset, we propose a simple, yet effective method for information sharing in the network.
arXiv Detail & Related papers (2021-11-02T05:02:24Z)
Dist2Cycle: A Simplicial Neural Network for Homology Localization [66.15805004725809]
Simplicial complexes can be viewed as high dimensional generalizations of graphs that explicitly encode multi-way ordered relations. We propose a graph convolutional model for learning functions parametrized by the $k$-homological features of simplicial complexes.
arXiv Detail & Related papers (2021-10-28T14:59:41Z)
Acceleration in Distributed Optimization Under Similarity [72.54787082152278]
We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes. An $varepsilon$-solution is achieved in $tildemathcalrhoObig(sqrtfracbeta/mu (1-)log1/varepsilonbig)$ number of communications steps. This rate matches (up to poly-log factors) for the first time lower complexity communication bounds of distributed gossip-algorithms applied to the class of problems of interest.
arXiv Detail & Related papers (2021-10-24T04:03:00Z)
Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss. We examine how these benign overfitting phenomena occur in a two-layer neural network setting. We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z)
$p$-Norm Flow Diffusion for Local Graph Clustering [18.90840167452118]
We present a family of convex optimization formulations based on the idea of diffusion with p-norm network flow for $pin (1,infty)$. In the context of local clustering, we show their usefulness in finding low conductance cuts around input seed set. We show that the proposed problem can be solved in strongly local running time for $pge 2$ and conduct empirical evaluations on both synthetic and real-world graphs to illustrate our approach compares favorably with existing methods.
arXiv Detail & Related papers (2020-05-20T01:08:17Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
Fully Asynchronous Policy Evaluation in Distributed Reinforcement Learning over Networks [14.636457985379746]
This paper proposes a emphfully asynchronous scheme for the policy evaluation problem of distributed reinforcement learning (DisRL) over directed peer-to-peer networks. Without waiting for any other node of the network, each node can locally update its value function at any time by using (possibly delayed) information from its neighbors.
arXiv Detail & Related papers (2020-03-01T08:12:08Z)
Block-Approximated Exponential Random Graphs [77.4792558024487]
An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large graphs. We propose an approximative framework to such non-trivial ERGs that result in dyadic independence (i.e., edge independent) distributions. Our methods are scalable to sparse graphs consisting of millions of nodes.
arXiv Detail & Related papers (2020-02-14T11:42:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.