A simplified convergence theory for Byzantine resilient stochastic
gradient descent
- URL: http://arxiv.org/abs/2208.11879v1
- Date: Thu, 25 Aug 2022 05:37:14 GMT
- Title: A simplified convergence theory for Byzantine resilient stochastic
gradient descent
- Authors: Lindon Roberts, Edward Smyth
- Abstract summary: In distributed learning, a central server trains a model according to updates provided by nodes holding local data samples.
In the presence of one or more malicious servers, standard algorithms for training model such as gradient descent (SGD) fail to converge.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In distributed learning, a central server trains a model according to updates
provided by nodes holding local data samples. In the presence of one or more
malicious servers sending incorrect information (a Byzantine adversary),
standard algorithms for model training such as stochastic gradient descent
(SGD) fail to converge. In this paper, we present a simplified convergence
theory for the generic Byzantine Resilient SGD method originally proposed by
Blanchard et al. [NeurIPS 2017]. Compared to the existing analysis, we shown
convergence to a stationary point in expectation under standard assumptions on
the (possibly nonconvex) objective function and flexible assumptions on the
stochastic gradients.
Related papers
- Topology-Aware Dynamic Reweighting for Distribution Shifts on Graph [24.44321658238713]
Graph Neural Networks (GNNs) are widely used for node classification tasks but often fail to generalize when training and test nodes come from different distributions.
We introduce the Topology-Aware Dynamic Reweighting (TAR) framework, which dynamically adjusts sample weights through gradient flow in the Wasserstein space during training.
Our framework's superiority is demonstrated through standard testing on four graph OOD datasets and three class-imbalanced node classification datasets.
arXiv Detail & Related papers (2024-06-03T07:32:05Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Taming Nonconvex Stochastic Mirror Descent with General Bregman
Divergence [25.717501580080846]
This paper revisits the convergence of gradient Forward Mirror (SMD) in the contemporary non optimization setting.
For the training, we develop provably convergent algorithms for the problem of linear networks.
arXiv Detail & Related papers (2024-02-27T17:56:49Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders.
Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency.
We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z) - Convergence of Adam Under Relaxed Assumptions [72.24779199744954]
We show that Adam converges to $epsilon$-stationary points with $O(epsilon-4)$ gradient complexity under far more realistic conditions.
We also propose a variance-reduced version of Adam with an accelerated gradient complexity of $O(epsilon-3)$.
arXiv Detail & Related papers (2023-04-27T06:27:37Z) - A Convergence Theory for Federated Average: Beyond Smoothness [28.074273047592065]
Federated learning enables a large amount of edge computing devices to learn a model without data sharing jointly.
As a leading algorithm in this setting, Federated Average FedAvg, which runs Gradient Descent (SGD) in parallel on local devices, has been widely used.
This paper provides a theoretical convergence study on Federated Learning.
arXiv Detail & Related papers (2022-11-03T04:50:49Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - A High Probability Analysis of Adaptive SGD with Momentum [22.9530287983179]
Gradient Descent (DSG) and its variants are the most used algorithms in machine learning applications.
We show for the first time the probability of the gradients to zero in smooth non setting for DelayedGrad with momentum.
arXiv Detail & Related papers (2020-07-28T15:06:22Z) - Biased Stochastic First-Order Methods for Conditional Stochastic Optimization and Applications in Meta Learning [24.12941820827126]
We propose a biased gradient descent (BSGD) for Conditional optimization problems.
Our lower bound analysis shows that BSGD cannot be improved for general convex objectives non objectives.
For this special setting, we propose an accelerated algorithm called biased SpiderBoost (BSpiderBoost) that matches the lower bound.
arXiv Detail & Related papers (2020-02-25T10:57:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.