Byzantine-Robust Decentralized Stochastic Optimization with Stochastic
Gradient Noise-Independent Learning Error
- URL: http://arxiv.org/abs/2308.05292v1
- Date: Thu, 10 Aug 2023 02:14:23 GMT
- Title: Byzantine-Robust Decentralized Stochastic Optimization with Stochastic
Gradient Noise-Independent Learning Error
- Authors: Jie Peng, Weiyu Li, Qing Ling
- Abstract summary: We study Byzantine-robust optimization over a decentralized network, where every agent periodically communicates with its neighbors to exchange local models, and then updates its own local model by gradient descent (SGD)
The performance of such a method is affected by an unknown number of Byzantine agents, which conduct adversarially during the optimization process.
- Score: 25.15075119957447
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies Byzantine-robust stochastic optimization over a
decentralized network, where every agent periodically communicates with its
neighbors to exchange local models, and then updates its own local model by
stochastic gradient descent (SGD). The performance of such a method is affected
by an unknown number of Byzantine agents, which conduct adversarially during
the optimization process. To the best of our knowledge, there is no existing
work that simultaneously achieves a linear convergence speed and a small
learning error. We observe that the learning error is largely dependent on the
intrinsic stochastic gradient noise. Motivated by this observation, we
introduce two variance reduction methods, stochastic average gradient algorithm
(SAGA) and loopless stochastic variance-reduced gradient (LSVRG), to
Byzantine-robust decentralized stochastic optimization for eliminating the
negative effect of the stochastic gradient noise. The two resulting methods,
BRAVO-SAGA and BRAVO-LSVRG, enjoy both linear convergence speeds and stochastic
gradient noise-independent learning errors. Such learning errors are optimal
for a class of methods based on total variation (TV)-norm regularization and
stochastic subgradient update. We conduct extensive numerical experiments to
demonstrate their effectiveness under various Byzantine attacks.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise [17.493808856903303]
One fundamental challenge in analyzing an approximation algorithm is to establish its stability.
In this paper, we extend the celebrated Borkar-Meyn theorem for stability bounded from the Martingale difference noise setting to the Markovian noise setting.
Central to our analysis is the diminishing rate of change of a few functions, which is implied by both a form of strong law of large numbers and a commonly used V4 Lynov drift condition.
arXiv Detail & Related papers (2024-01-15T17:20:17Z) - Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization [0.6906005491572401]
We show that noise in batch descent gradient (SGD) has the effect of smoothing objective function.
We analyze a new graduated optimization algorithm that varies the degree of smoothing by learning rate and batch size.
arXiv Detail & Related papers (2023-11-15T07:27:40Z) - Accelerated Almost-Sure Convergence Rates for Nonconvex Stochastic
Gradient Descent using Stochastic Learning Rates [0.0]
This paper uses almost-sure convergence rates of gradient-sure convergence rates of Gradient Descent to solve large-scale optimization problems.
In particular, its learning rate is equipped with a multiplicativeity learning rate.
arXiv Detail & Related papers (2021-10-25T04:27:35Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - BROADCAST: Reducing Both Stochastic and Compression Noise to Robustify
Communication-Efficient Federated Learning [24.016538592246377]
Communication between workers and master node to collect local gradients is a key bottleneck in a large-scale learning system.
In this work, we investigate the problem of Byzantine-robust federated learning with compression, where the attacks from Byzantine workers can be arbitrarily malicious.
In light of this observation, we propose to jointly reduce the noise and compression noise so as to improve the Byzantine-robustness.
arXiv Detail & Related papers (2021-04-14T08:16:03Z) - Training Generative Adversarial Networks by Solving Ordinary
Differential Equations [54.23691425062034]
We study the continuous-time dynamics induced by GAN training.
From this perspective, we hypothesise that instabilities in training GANs arise from the integration error.
We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training.
arXiv Detail & Related papers (2020-10-28T15:23:49Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z) - Robust Learning Rate Selection for Stochastic Optimization via Splitting
Diagnostic [5.395127324484869]
SplitSGD is a new dynamic learning schedule for optimization.
The method decreases the learning rate for better adaptation to the local geometry of the objective function.
It essentially does not incur additional computational cost than standard SGD.
arXiv Detail & Related papers (2019-10-18T19:38:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.