Yes, Topology Matters in Decentralized Optimization: Refined Convergence
and Topology Learning under Heterogeneous Data
- URL: http://arxiv.org/abs/2204.04452v1
- Date: Sat, 9 Apr 2022 11:29:12 GMT
- Title: Yes, Topology Matters in Decentralized Optimization: Refined Convergence
and Topology Learning under Heterogeneous Data
- Authors: B. Le Bars and A. Bellet and M. Tommasi and AM. Kermarrec
- Abstract summary: We revisit the analysis of Decentralized Gradient Descent algorithm (D-SGD), a popular decentralized learning algorithm, under data heterogeneity.
We exhibit the key role played by a new quantity, that we call neighborhood heterogeneity, on the convergence rate of D-SGD.
By coupling the topology and the heterogeneity of the agents' distributions, our analysis sheds light on the poorly understood interplay between these two concepts in decentralized learning.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One of the key challenges in federated and decentralized learning is to
design algorithms that efficiently deal with highly heterogeneous data
distributions across agents. In this paper, we revisit the analysis of
Decentralized Stochastic Gradient Descent algorithm (D-SGD), a popular
decentralized learning algorithm, under data heterogeneity. We exhibit the key
role played by a new quantity, that we call neighborhood heterogeneity, on the
convergence rate of D-SGD. Unlike prior work, neighborhood heterogeneity is
measured at the level of the neighborhood of an agent in the graph topology. By
coupling the topology and the heterogeneity of the agents' distributions, our
analysis sheds light on the poorly understood interplay between these two
concepts in decentralized learning. We then argue that neighborhood
heterogeneity provides a natural criterion to learn sparse data-dependent
topologies that reduce (and can even eliminate) the otherwise detrimental
effect of data heterogeneity on the convergence time of D-SGD. For the
important case of classification with label skew, we formulate the problem of
learning such a good topology as a tractable optimization problem that we solve
with a Frank-Wolfe algorithm. Our approach provides a principled way to design
a sparse topology that balances the number of iterations and the per-iteration
communication costs of D-SGD under data heterogeneity.
Related papers
- Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration [66.43954501171292]
We introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata.
DFedCata consists of two main components: the Moreau envelope function, which addresses parameter inconsistencies, and Nesterov's extrapolation step, which accelerates the aggregation phase.
Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions.
arXiv Detail & Related papers (2024-10-09T06:17:16Z) - Stability and Generalization of the Decentralized Stochastic Gradient
Descent Ascent Algorithm [80.94861441583275]
We investigate the complexity of the generalization bound of the decentralized gradient descent (D-SGDA) algorithm.
Our results analyze the impact of different top factors on the generalization of D-SGDA.
We also balance it with the generalization to obtain the optimal convex-concave setting.
arXiv Detail & Related papers (2023-10-31T11:27:01Z) - Distributed Linear Regression with Compositional Covariates [5.085889377571319]
We focus on the distributed sparse penalized linear log-contrast model in massive compositional data.
Two distributed optimization techniques are proposed for solving the two different constrained convex optimization problems.
In the decentralized topology, we introduce a distributed coordinate-wise descent algorithm for obtaining a communication-efficient regularized estimation.
arXiv Detail & Related papers (2023-10-21T11:09:37Z) - Addressing Data Heterogeneity in Decentralized Learning via Topological
Pre-processing [0.9645196221785693]
We show the advantages of constructing a proxy-based locally heterogeneous DL topology to enhance convergence and maintain data privacy.
We propose a novel peer clumping strategy to efficiently cluster peers before arranging them in a final training graph.
arXiv Detail & Related papers (2022-12-16T22:46:38Z) - Data-heterogeneity-aware Mixing for Decentralized Learning [63.83913592085953]
We characterize the dependence of convergence on the relationship between the mixing weights of the graph and the data heterogeneity across nodes.
We propose a metric that quantifies the ability of a graph to mix the current gradients.
Motivated by our analysis, we propose an approach that periodically and efficiently optimize the metric.
arXiv Detail & Related papers (2022-04-13T15:54:35Z) - Resource-constrained Federated Edge Learning with Heterogeneous Data:
Formulation and Analysis [8.863089484787835]
We propose a distributed approximate Newton-type Newton-type training scheme, namely FedOVA, to solve the heterogeneous statistical challenge brought by heterogeneous data.
FedOVA decomposes a multi-class classification problem into more straightforward binary classification problems and then combines their respective outputs using ensemble learning.
arXiv Detail & Related papers (2021-10-14T17:35:24Z) - Decentralized Local Stochastic Extra-Gradient for Variational
Inequalities [125.62877849447729]
We consider distributed variational inequalities (VIs) on domains with the problem data that is heterogeneous (non-IID) and distributed across many devices.
We make a very general assumption on the computational network that covers the settings of fully decentralized calculations.
We theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone settings.
arXiv Detail & Related papers (2021-06-15T17:45:51Z) - Quasi-Global Momentum: Accelerating Decentralized Deep Learning on
Heterogeneous Data [77.88594632644347]
Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks.
In realistic learning scenarios, the presence of heterogeneity across different clients' local datasets poses an optimization challenge.
We propose a novel momentum-based method to mitigate this decentralized training difficulty.
arXiv Detail & Related papers (2021-02-09T11:27:14Z) - Domain Adaptation by Topology Regularization [0.0]
Domain adaptation (DA) or transfer learning (TL) enables algorithms to transfer knowledge from a labelled (source) data set to an unlabelled but related (target) data set of interest.
We propose to leverage global data structure by applying a topological data analysis technique called persistent homology to TL.
arXiv Detail & Related papers (2021-01-28T16:45:41Z) - A Unified Theory of Decentralized SGD with Changing Topology and Local
Updates [70.9701218475002]
We introduce a unified convergence analysis of decentralized communication methods.
We derive universal convergence rates for several applications.
Our proofs rely on weak assumptions.
arXiv Detail & Related papers (2020-03-23T17:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.