Escaping Saddle Points in Distributed Newton's Method with Communication
efficiency and Byzantine Resilience
- URL: http://arxiv.org/abs/2103.09424v1
- Date: Wed, 17 Mar 2021 03:53:58 GMT
- Title: Escaping Saddle Points in Distributed Newton's Method with Communication
efficiency and Byzantine Resilience
- Authors: Avishek Ghosh, Raj Kumar Maity, Arya Mazumdar, Kannan Ramchandran
- Abstract summary: We consider the problem of optimizing a non-regularized loss function (with saddle points) in a distributed framework in the presence of Byzantine machines.
We robustify the cubic-regularized Newton algorithm such that it avoids the saddle points and the fake local minimas efficiently.
We obtain theoretical guarantees for our proposed scheme under several approximate settings including (sub-sampled) and Hessians.
- Score: 49.379254729853756
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: We study the problem of optimizing a non-convex loss function (with saddle
points) in a distributed framework in the presence of Byzantine machines. We
consider a standard distributed setting with one central machine (parameter
server) communicating with many worker machines. Our proposed algorithm is a
variant of the celebrated cubic-regularized Newton method of Nesterov and
Polyak \cite{nest}, which avoids saddle points efficiently and converges to
local minima. Furthermore, our algorithm resists the presence of Byzantine
machines, which may create \emph{fake local minima} near the saddle points of
the loss function, also known as saddle-point attack. We robustify the
cubic-regularized Newton algorithm such that it avoids the saddle points and
the fake local minimas efficiently. Furthermore, being a second order
algorithm, the iteration complexity is much lower than its first order
counterparts, and thus our algorithm communicates little with the parameter
server. We obtain theoretical guarantees for our proposed scheme under several
settings including approximate (sub-sampled) gradients and Hessians. Moreover,
we validate our theoretical findings with experiments using standard datasets
and several types of Byzantine attacks.
Related papers
- Newton Meets Marchenko-Pastur: Massively Parallel Second-Order Optimization with Hessian Sketching and Debiasing [45.475515050909706]
We consider the problem of minimizing a convex function in a massively parallel fashion, where communication between workers is limited.
We propose a scheme where the central node (server) effectively runs a Newton method, offloading its high per-iteration cost.
In our solution, workers produce independently coarse but low-bias estimates of the inverse Hessian, using an adaptive sketching scheme.
arXiv Detail & Related papers (2024-10-02T09:38:04Z) - Decentralized Riemannian Algorithm for Nonconvex Minimax Problems [82.50374560598493]
The minimax algorithms for neural networks have been developed to solve many problems.
In this paper, we propose two types of minimax algorithms.
For the setting, we propose DRSGDA and prove that our method achieves a gradient.
arXiv Detail & Related papers (2023-02-08T01:42:45Z) - Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems.
It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z) - Optimal Algorithms for Stochastic Complementary Composite Minimization [55.26935605535377]
Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization.
We provide novel excess risk bounds, both in expectation and with high probability.
Our algorithms are nearly optimal, which we prove via novel lower complexity bounds for this class of problems.
arXiv Detail & Related papers (2022-11-03T12:40:24Z) - BROADCAST: Reducing Both Stochastic and Compression Noise to Robustify
Communication-Efficient Federated Learning [24.016538592246377]
Communication between workers and master node to collect local gradients is a key bottleneck in a large-scale learning system.
In this work, we investigate the problem of Byzantine-robust federated learning with compression, where the attacks from Byzantine workers can be arbitrarily malicious.
In light of this observation, we propose to jointly reduce the noise and compression noise so as to improve the Byzantine-robustness.
arXiv Detail & Related papers (2021-04-14T08:16:03Z) - Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which.
machines can independently compute gradients, and cooperate.
Our algorithm is based on a new concentration technique, and its sample complexity.
It is very practical: it improves upon the performance of all prior methods when no.
setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z) - Distributed Saddle-Point Problems: Lower Bounds, Near-Optimal and Robust
Algorithms [125.99533416395765]
This paper focuses on the distributed optimization of saddle point problems.
The experimental part of the paper, we show the effectiveness of our distributed method in practice.
arXiv Detail & Related papers (2020-10-25T13:13:44Z) - ByGARS: Byzantine SGD with Arbitrary Number of Attackers [8.987350240289757]
We propose two novel computation algorithms for distributed machine learning in the presence of Byzantine adversaries.
In these algorithms, reputation scores workers are computed using an auxiliary dataset at the server.
We show that the proposed algorithms are robust to multiple different types of attacks at the same time.
arXiv Detail & Related papers (2020-06-24T01:47:13Z) - Distributed Newton Can Communicate Less and Resist Byzantine Workers [31.271927533726842]
We develop a distributed second order optimization algorithm that is communication-efficient as well as robust against Byzantine failures of the worker machines.
We establish the linear-quadratic rate of convergence of COMRADE and establish that the communication savings and Byzantine resilience result in only a small statistical error rate for arbitrary convex loss functions.
arXiv Detail & Related papers (2020-06-15T20:16:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.