Byzantine-Resilient Non-Convex Stochastic Gradient Descent
- URL: http://arxiv.org/abs/2012.14368v2
- Date: Fri, 2 Apr 2021 17:25:48 GMT
- Title: Byzantine-Resilient Non-Convex Stochastic Gradient Descent
- Authors: Zeyuan Allen-Zhu, Faeze Ebrahimian, Jerry Li, Dan Alistarh
- Abstract summary: adversary-resilient distributed optimization, in which.
machines can independently compute gradients, and cooperate.
Our algorithm is based on a new concentration technique, and its sample complexity.
It is very practical: it improves upon the performance of all prior methods when no.
setting machines are present.
- Score: 61.6382287971982
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study adversary-resilient stochastic distributed optimization, in which
$m$ machines can independently compute stochastic gradients, and cooperate to
jointly optimize over their local objective functions. However, an
$\alpha$-fraction of the machines are $\textit{Byzantine}$, in that they may
behave in arbitrary, adversarial ways. We consider a variant of this procedure
in the challenging $\textit{non-convex}$ case. Our main result is a new
algorithm SafeguardSGD which can provably escape saddle points and find
approximate local minima of the non-convex objective. The algorithm is based on
a new concentration filtering technique, and its sample and time complexity
bounds match the best known theoretical bounds in the stochastic, distributed
setting when no Byzantine machines are present.
Our algorithm is very practical: it improves upon the performance of all
prior methods when training deep neural networks, it is relatively lightweight,
and it is the first method to withstand two recently-proposed Byzantine
attacks.
Related papers
- Online Optimization Perspective on First-Order and Zero-Order Decentralized Nonsmooth Nonconvex Stochastic Optimization [8.670873561640903]
We investigate the finite-time analysis of finding ($delta,,ilon$)-stationary points for nonsmooth nonsmooth objectives in decentralized settings.
$O is the first finite-time guarantee for decentralized nonsmooth optimization.
arXiv Detail & Related papers (2024-06-03T16:09:34Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary.
Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z) - Escaping Saddle Points in Distributed Newton's Method with Communication
efficiency and Byzantine Resilience [49.379254729853756]
We consider the problem of optimizing a non-regularized loss function (with saddle points) in a distributed framework in the presence of Byzantine machines.
We robustify the cubic-regularized Newton algorithm such that it avoids the saddle points and the fake local minimas efficiently.
We obtain theoretical guarantees for our proposed scheme under several approximate settings including (sub-sampled) and Hessians.
arXiv Detail & Related papers (2021-03-17T03:53:58Z) - Stochastic optimization with momentum: convergence, fluctuations, and
traps avoidance [0.0]
In this paper, a general optimization procedure is studied, unifying several variants of the gradient descent such as, among others, the heavy ball method, the Nesterov Accelerated Gradient (S-NAG), and the widely used Adam method.
The avoidance is studied as a noisy discretization of a non-autonomous ordinary differential equation.
arXiv Detail & Related papers (2020-12-07T19:14:49Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Second-Order Guarantees in Centralized, Federated and Decentralized
Nonconvex Optimization [64.26238893241322]
Simple algorithms have been shown to lead to good empirical results in many contexts.
Several works have pursued rigorous analytical justification for studying non optimization problems.
A key insight in these analyses is that perturbations play a critical role in allowing local descent algorithms.
arXiv Detail & Related papers (2020-03-31T16:54:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.