Generalization Error Matters in Decentralized Learning Under Byzantine Attacks
- URL: http://arxiv.org/abs/2407.08632v1
- Date: Thu, 11 Jul 2024 16:12:53 GMT
- Title: Generalization Error Matters in Decentralized Learning Under Byzantine Attacks
- Authors: Haoxiang Ye, Qing Ling,
- Abstract summary: Decentralized learning has emerged as a popular peer-to-peer signal and information processing paradigm.
We provide the first analysis of the generalization errors for a class of popular Byzantine-resilient decentralized gradient (DSGD) algorithms.
- Score: 22.589653582068117
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, decentralized learning has emerged as a popular peer-to-peer signal and information processing paradigm that enables model training across geographically distributed agents in a scalable manner, without the presence of any central server. When some of the agents are malicious (also termed as Byzantine), resilient decentralized learning algorithms are able to limit the impact of these Byzantine agents without knowing their number and identities, and have guaranteed optimization errors. However, analysis of the generalization errors, which are critical to implementations of the trained models, is still lacking. In this paper, we provide the first analysis of the generalization errors for a class of popular Byzantine-resilient decentralized stochastic gradient descent (DSGD) algorithms. Our theoretical results reveal that the generalization errors cannot be entirely eliminated because of the presence of the Byzantine agents, even if the number of training samples are infinitely large. Numerical experiments are conducted to confirm our theoretical results.
Related papers
- Stability and Generalization of the Decentralized Stochastic Gradient
Descent Ascent Algorithm [80.94861441583275]
We investigate the complexity of the generalization bound of the decentralized gradient descent (D-SGDA) algorithm.
Our results analyze the impact of different top factors on the generalization of D-SGDA.
We also balance it with the generalization to obtain the optimal convex-concave setting.
arXiv Detail & Related papers (2023-10-31T11:27:01Z) - Anonymous Learning via Look-Alike Clustering: A Precise Analysis of
Model Generalization [18.03833857491361]
A common approach to enhancing privacy involves training models using anonymous data rather than individual data.
We provide an analysis of how training models using anonymous cluster centers affects their generalization capabilities.
In certain high-dimensional regimes, training over anonymous cluster centers acts as a regularization and improves generalization error of the trained models.
arXiv Detail & Related papers (2023-10-06T04:52:46Z) - Modeling Uncertain Feature Representation for Domain Generalization [49.129544670700525]
We show that our method consistently improves the network generalization ability on multiple vision tasks.
Our methods are simple yet effective and can be readily integrated into networks without additional trainable parameters or loss constraints.
arXiv Detail & Related papers (2023-01-16T14:25:02Z) - GAN based Data Augmentation to Resolve Class Imbalance [0.0]
In many related tasks, the datasets have a very small number of observed fraud cases.
This imbalance presence may impact any learning model's behavior by predicting all labels as the majority class.
We trained Generative Adversarial Network(GAN) to generate a large number of convincing (and reliable) synthetic examples of the minority class.
arXiv Detail & Related papers (2022-06-12T21:21:55Z) - Secure Distributed Training at Scale [65.7538150168154]
Training in presence of peers requires specialized distributed training algorithms with Byzantine tolerance.
We propose a novel protocol for secure (Byzantine-tolerant) decentralized training that emphasizes communication efficiency.
arXiv Detail & Related papers (2021-06-21T17:00:42Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Statistical and Algorithmic Insights for Semi-supervised Learning with
Self-training [30.866440916522826]
Self-training is a classical approach in semi-supervised learning.
We show that self-training iterations gracefully improve the model accuracy even if they do get stuck in sub-optimal fixed points.
We then establish a connection between self-training based semi-supervision and the more general problem of learning with heterogenous data.
arXiv Detail & Related papers (2020-06-19T08:09:07Z) - Byzantine-resilient Decentralized Stochastic Gradient Descent [85.15773446094576]
We present an in-depth study towards the Byzantine resilience of decentralized learning systems.
We propose UBAR, a novel algorithm to enhance decentralized learning with Byzantine Fault Tolerance.
arXiv Detail & Related papers (2020-02-20T05:11:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.