Related papers: Byzantine-Resilient Non-Convex Stochastic Gradient Descent

Byzantine-Resilient Non-Convex Stochastic Gradient Descent

URL: http://arxiv.org/abs/2012.14368v2
Date: Fri, 2 Apr 2021 17:25:48 GMT
Title: Byzantine-Resilient Non-Convex Stochastic Gradient Descent
Authors: Zeyuan Allen-Zhu, Faeze Ebrahimian, Jerry Li, Dan Alistarh
Abstract summary: adversary-resilient distributed optimization, in which. machines can independently compute gradients, and cooperate. Our algorithm is based on a new concentration technique, and its sample complexity. It is very practical: it improves upon the performance of all prior methods when no. setting machines are present.
Score: 61.6382287971982
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study adversary-resilient stochastic distributed optimization, in which $m$ machines can independently compute stochastic gradients, and cooperate to jointly optimize over their local objective functions. However, an $\alpha$-fraction of the machines are $\textit{Byzantine}$, in that they may behave in arbitrary, adversarial ways. We consider a variant of this procedure in the challenging $\textit{non-convex}$ case. Our main result is a new algorithm SafeguardSGD which can provably escape saddle points and find approximate local minima of the non-convex objective. The algorithm is based on a new concentration filtering technique, and its sample and time complexity bounds match the best known theoretical bounds in the stochastic, distributed setting when no Byzantine machines are present. Our algorithm is very practical: it improves upon the performance of all prior methods when training deep neural networks, it is relatively lightweight, and it is the first method to withstand two recently-proposed Byzantine attacks.

Related papers

Span-Agnostic Optimal Sample Complexity and Oracle Inequalities for Average-Reward RL [6.996002801232415]
We study the sample complexity of finding an $varepsilon$-optimal policy in Markov Decision Processes (MDPs) with a generative model. We develop the first algorithms matching the optimal span-based complexity without $H$ knowledge.
arXiv Detail & Related papers (2025-02-16T19:10:55Z)
Online Optimization Perspective on First-Order and Zero-Order Decentralized Nonsmooth Nonconvex Stochastic Optimization [8.670873561640903]
We investigate the finite-time analysis of finding ($delta,,ilon$)-stationary points for nonsmooth nonsmooth objectives in decentralized settings. $O is the first finite-time guarantee for decentralized nonsmooth optimization.
arXiv Detail & Related papers (2024-06-03T16:09:34Z)
Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks. We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution. We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z)
Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary. Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z)
Escaping Saddle Points in Distributed Newton's Method with Communication efficiency and Byzantine Resilience [49.379254729853756]
We consider the problem of optimizing a non-regularized loss function (with saddle points) in a distributed framework in the presence of Byzantine machines. We robustify the cubic-regularized Newton algorithm such that it avoids the saddle points and the fake local minimas efficiently. We obtain theoretical guarantees for our proposed scheme under several approximate settings including (sub-sampled) and Hessians.
arXiv Detail & Related papers (2021-03-17T03:53:58Z)
Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance [0.0]
In this paper, a general optimization procedure is studied, unifying several variants of the gradient descent such as, among others, the heavy ball method, the Nesterov Accelerated Gradient (S-NAG), and the widely used Adam method. The avoidance is studied as a noisy discretization of a non-autonomous ordinary differential equation.
arXiv Detail & Related papers (2020-12-07T19:14:49Z)
Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step. Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
Second-Order Guarantees in Centralized, Federated and Decentralized Nonconvex Optimization [64.26238893241322]
Simple algorithms have been shown to lead to good empirical results in many contexts. Several works have pursued rigorous analytical justification for studying non optimization problems. A key insight in these analyses is that perturbations play a critical role in allowing local descent algorithms.
arXiv Detail & Related papers (2020-03-31T16:54:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.