Distributed Newton Can Communicate Less and Resist Byzantine Workers
- URL: http://arxiv.org/abs/2006.08737v1
- Date: Mon, 15 Jun 2020 20:16:15 GMT
- Title: Distributed Newton Can Communicate Less and Resist Byzantine Workers
- Authors: Avishek Ghosh, Raj Kumar Maity and Arya Mazumdar
- Abstract summary: We develop a distributed second order optimization algorithm that is communication-efficient as well as robust against Byzantine failures of the worker machines.
We establish the linear-quadratic rate of convergence of COMRADE and establish that the communication savings and Byzantine resilience result in only a small statistical error rate for arbitrary convex loss functions.
- Score: 31.271927533726842
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop a distributed second order optimization algorithm that is
communication-efficient as well as robust against Byzantine failures of the
worker machines. We propose COMRADE (COMunication-efficient and Robust
Approximate Distributed nEwton), an iterative second order algorithm, where the
worker machines communicate only once per iteration with the center machine.
This is in sharp contrast with the state-of-the-art distributed second order
algorithms like GIANT [34] and DINGO[7], where the worker machines send
(functions of) local gradient and Hessian sequentially; thus ending up
communicating twice with the center machine per iteration. Moreover, we show
that the worker machines can further compress the local information before
sending it to the center. In addition, we employ a simple norm based
thresholding rule to filter-out the Byzantine worker machines. We establish the
linear-quadratic rate of convergence of COMRADE and establish that the
communication savings and Byzantine resilience result in only a small
statistical error rate for arbitrary convex loss functions. To the best of our
knowledge, this is the first work that addresses the issue of Byzantine
resilience in second order distributed optimization. Furthermore, we validate
our theoretical results with extensive experiments on synthetic and benchmark
LIBSVM [5] data-sets and demonstrate convergence guarantees.
Related papers
- Newton Meets Marchenko-Pastur: Massively Parallel Second-Order Optimization with Hessian Sketching and Debiasing [45.475515050909706]
We consider the problem of minimizing a convex function in a massively parallel fashion, where communication between workers is limited.
We propose a scheme where the central node (server) effectively runs a Newton method, offloading its high per-iteration cost.
In our solution, workers produce independently coarse but low-bias estimates of the inverse Hessian, using an adaptive sketching scheme.
arXiv Detail & Related papers (2024-10-02T09:38:04Z) - Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering [17.446431849022346]
Distributed learning has become the standard approach for training large-scale machine learning models across private data silos.
It faces critical challenges related to robustness and communication preservation.
We propose a novel Byzantine-robust and communication-efficient distributed learning method.
arXiv Detail & Related papers (2024-09-13T08:53:10Z) - CORE: Common Random Reconstruction for Distributed Optimization with
Provable Low Communication Complexity [110.50364486645852]
Communication complexity has become a major bottleneck for speeding up training and scaling up machine numbers.
We propose Common Om REOm, which can be used to compress information transmitted between machines.
arXiv Detail & Related papers (2023-09-23T08:45:27Z) - Byzantine-Robust Online and Offline Distributed Reinforcement Learning [60.970950468309056]
We consider a distributed reinforcement learning setting where multiple agents explore the environment and communicate their experiences through a central server.
$alpha$-fraction of agents are adversarial and can report arbitrary fake information.
We seek to identify a near-optimal policy for the underlying Markov decision process in the presence of these adversarial agents.
arXiv Detail & Related papers (2022-06-01T00:44:53Z) - Escaping Saddle Points in Distributed Newton's Method with Communication
efficiency and Byzantine Resilience [49.379254729853756]
We consider the problem of optimizing a non-regularized loss function (with saddle points) in a distributed framework in the presence of Byzantine machines.
We robustify the cubic-regularized Newton algorithm such that it avoids the saddle points and the fake local minimas efficiently.
We obtain theoretical guarantees for our proposed scheme under several approximate settings including (sub-sampled) and Hessians.
arXiv Detail & Related papers (2021-03-17T03:53:58Z) - Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which.
machines can independently compute gradients, and cooperate.
Our algorithm is based on a new concentration technique, and its sample complexity.
It is very practical: it improves upon the performance of all prior methods when no.
setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z) - A Linearly Convergent Algorithm for Decentralized Optimization: Sending
Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator.
We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators.
We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z) - Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing [55.012801269326594]
In Byzantine robust distributed learning, a central server wants to train a machine learning model over data distributed across multiple workers.
A fraction of these workers may deviate from the prescribed algorithm and send arbitrary messages.
We propose a simple bucketing scheme that adapts existing robust algorithms to heterogeneous datasets at a negligible computational cost.
arXiv Detail & Related papers (2020-06-16T17:58:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.