On Generalization of Decentralized Learning with Separable Data
- URL: http://arxiv.org/abs/2209.07116v4
- Date: Mon, 27 Mar 2023 07:28:31 GMT
- Title: On Generalization of Decentralized Learning with Separable Data
- Authors: Hossein Taheri, Christos Thrampoulidis
- Abstract summary: We study algorithmic and generalization properties of decentralized learning with gradient descent on separable data.
Specifically, for decentralized gradient descent and a variety of loss functions that asymptote to zero at infinity, we derive novel finite-time generalization bounds.
- Score: 37.908159361149835
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decentralized learning offers privacy and communication efficiency when data
are naturally distributed among agents communicating over an underlying graph.
Motivated by overparameterized learning settings, in which models are trained
to zero training loss, we study algorithmic and generalization properties of
decentralized learning with gradient descent on separable data. Specifically,
for decentralized gradient descent (DGD) and a variety of loss functions that
asymptote to zero at infinity (including exponential and logistic losses), we
derive novel finite-time generalization bounds. This complements a long line of
recent work that studies the generalization performance and the implicit bias
of gradient descent over separable data, but has thus far been limited to
centralized learning scenarios. Notably, our generalization bounds
approximately match in order their centralized counterparts. Critical behind
this, and of independent interest, is establishing novel bounds on the training
loss and the rate-of-consensus of DGD for a class of self-bounded losses.
Finally, on the algorithmic front, we design improved gradient-based routines
for decentralized learning with separable data and empirically demonstrate
orders-of-magnitude of speed-up in terms of both training and generalization
performance.
Related papers
- Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration [66.43954501171292]
We introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata.
DFedCata consists of two main components: the Moreau envelope function, which addresses parameter inconsistencies, and Nesterov's extrapolation step, which accelerates the aggregation phase.
Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions.
arXiv Detail & Related papers (2024-10-09T06:17:16Z) - Understanding Generalization of Federated Learning via Stability:
Heterogeneity Matters [1.4502611532302039]
Generalization performance is a key metric in evaluating machine learning models when applied to real-world applications.
Generalization performance is a key metric in evaluating machine learning models when applied to real-world applications.
arXiv Detail & Related papers (2023-06-06T16:12:35Z) - CoDeC: Communication-Efficient Decentralized Continual Learning [6.663641564969944]
Training at the edge utilizes continuously evolving data generated at different locations.
Privacy concerns prohibit the co-location of this spatially as well as temporally distributed data.
We propose CoDeC, a novel communication-efficient decentralized continual learning algorithm.
arXiv Detail & Related papers (2023-03-27T16:52:17Z) - Straggler-Resilient Differentially-Private Decentralized Learning [22.399703712241546]
We consider the straggler problem in decentralized learning over a logical ring while preserving user data privacy.
Analytical results on both the convergence speed and the DP level are derived for both a skipping scheme and a baseline scheme.
arXiv Detail & Related papers (2022-12-06T15:50:28Z) - Data-heterogeneity-aware Mixing for Decentralized Learning [63.83913592085953]
We characterize the dependence of convergence on the relationship between the mixing weights of the graph and the data heterogeneity across nodes.
We propose a metric that quantifies the ability of a graph to mix the current gradients.
Motivated by our analysis, we propose an approach that periodically and efficiently optimize the metric.
arXiv Detail & Related papers (2022-04-13T15:54:35Z) - Understanding the Generalization of Adam in Learning Neural Networks
with Proper Regularization [118.50301177912381]
We show that Adam can converge to different solutions of the objective with provably different errors, even with weight decay globalization.
We show that if convex, and the weight decay regularization is employed, any optimization algorithms including Adam will converge to the same solution.
arXiv Detail & Related papers (2021-08-25T17:58:21Z) - Decentralized Local Stochastic Extra-Gradient for Variational
Inequalities [125.62877849447729]
We consider distributed variational inequalities (VIs) on domains with the problem data that is heterogeneous (non-IID) and distributed across many devices.
We make a very general assumption on the computational network that covers the settings of fully decentralized calculations.
We theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone settings.
arXiv Detail & Related papers (2021-06-15T17:45:51Z) - Cross-Gradient Aggregation for Decentralized Learning from Non-IID data [34.23789472226752]
Decentralized learning enables a group of collaborative agents to learn models using a distributed dataset without the need for a central parameter server.
We propose Cross-Gradient Aggregation (CGA), a novel decentralized learning algorithm.
We show superior learning performance of CGA over existing state-of-the-art decentralized learning algorithms.
arXiv Detail & Related papers (2021-03-02T21:58:12Z) - Linear Regression with Distributed Learning: A Generalization Error
Perspective [0.0]
We investigate the performance of distributed learning for large-scale linear regression.
We focus on the generalization error, i.e., the performance on unseen data.
Our results show that the generalization error of the distributed solution can be substantially higher than that of the centralized solution.
arXiv Detail & Related papers (2021-01-22T08:43:28Z) - Quantized Decentralized Stochastic Learning over Directed Graphs [52.94011236627326]
We consider a decentralized learning problem where data points are distributed among computing nodes communicating over a directed graph.
As the model size gets large, decentralized learning faces a major bottleneck that is the communication load due to each node transmitting messages (model updates) to its neighbors.
We propose the quantized decentralized learning algorithm over directed graphs that is based on the push-sum algorithm in decentralized consensus optimization.
arXiv Detail & Related papers (2020-02-23T18:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.