Related papers: Escaping Saddle Points in Distributed Newton's Method with Communication efficiency and Byzantine Resilience

Escaping Saddle Points in Distributed Newton's Method with Communication efficiency and Byzantine Resilience

URL: http://arxiv.org/abs/2103.09424v1
Date: Wed, 17 Mar 2021 03:53:58 GMT
Title: Escaping Saddle Points in Distributed Newton's Method with Communication efficiency and Byzantine Resilience
Authors: Avishek Ghosh, Raj Kumar Maity, Arya Mazumdar, Kannan Ramchandran
Abstract summary: We consider the problem of optimizing a non-regularized loss function (with saddle points) in a distributed framework in the presence of Byzantine machines. We robustify the cubic-regularized Newton algorithm such that it avoids the saddle points and the fake local minimas efficiently. We obtain theoretical guarantees for our proposed scheme under several approximate settings including (sub-sampled) and Hessians.
Score: 49.379254729853756
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: We study the problem of optimizing a non-convex loss function (with saddle points) in a distributed framework in the presence of Byzantine machines. We consider a standard distributed setting with one central machine (parameter server) communicating with many worker machines. Our proposed algorithm is a variant of the celebrated cubic-regularized Newton method of Nesterov and Polyak \cite{nest}, which avoids saddle points efficiently and converges to local minima. Furthermore, our algorithm resists the presence of Byzantine machines, which may create \emph{fake local minima} near the saddle points of the loss function, also known as saddle-point attack. We robustify the cubic-regularized Newton algorithm such that it avoids the saddle points and the fake local minimas efficiently. Furthermore, being a second order algorithm, the iteration complexity is much lower than its first order counterparts, and thus our algorithm communicates little with the parameter server. We obtain theoretical guarantees for our proposed scheme under several settings including approximate (sub-sampled) gradients and Hessians. Moreover, we validate our theoretical findings with experiments using standard datasets and several types of Byzantine attacks.

Related papers

Single-loop Algorithms for Stochastic Non-convex Optimization with Weakly-Convex Constraints [49.76332265680669]
This paper examines a crucial subset of problems where both the objective and constraint functions are weakly convex. Existing methods often face limitations, including slow convergence rates or reliance on double-loop designs. We introduce a novel single-loop penalty-based algorithm to overcome these challenges.
arXiv Detail & Related papers (2025-04-21T17:15:48Z)
Newton Meets Marchenko-Pastur: Massively Parallel Second-Order Optimization with Hessian Sketching and Debiasing [45.475515050909706]
We consider the problem of minimizing a convex function in a massively parallel fashion, where communication between workers is limited. We propose a scheme where the central node (server) effectively runs a Newton method, offloading its high per-iteration cost. In our solution, workers produce independently coarse but low-bias estimates of the inverse Hessian, using an adaptive sketching scheme.
arXiv Detail & Related papers (2024-10-02T09:38:04Z)
Alternating Iteratively Reweighted $\ell_1$ and Subspace Newton Algorithms for Nonconvex Sparse Optimization [11.56128809794923]
We present a novel hybrid algorithm for minimizing the sum of a differentiable loss function and a nonsmooth regularization function. We prove global convergence to a critical point and, under suitable conditions, demonstrate that the algorithm outperforms existing methods.
arXiv Detail & Related papers (2024-07-24T12:15:59Z)
Decentralized Riemannian Algorithm for Nonconvex Minimax Problems [82.50374560598493]
The minimax algorithms for neural networks have been developed to solve many problems. In this paper, we propose two types of minimax algorithms. For the setting, we propose DRSGDA and prove that our method achieves a gradient.
arXiv Detail & Related papers (2023-02-08T01:42:45Z)
Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems. It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z)
Optimal Algorithms for Stochastic Complementary Composite Minimization [55.26935605535377]
Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization. We provide novel excess risk bounds, both in expectation and with high probability. Our algorithms are nearly optimal, which we prove via novel lower complexity bounds for this class of problems.
arXiv Detail & Related papers (2022-11-03T12:40:24Z)
BROADCAST: Reducing Both Stochastic and Compression Noise to Robustify Communication-Efficient Federated Learning [24.016538592246377]
Communication between workers and master node to collect local gradients is a key bottleneck in a large-scale learning system. In this work, we investigate the problem of Byzantine-robust federated learning with compression, where the attacks from Byzantine workers can be arbitrarily malicious. In light of this observation, we propose to jointly reduce the noise and compression noise so as to improve the Byzantine-robustness.
arXiv Detail & Related papers (2021-04-14T08:16:03Z)
Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which. machines can independently compute gradients, and cooperate. Our algorithm is based on a new concentration technique, and its sample complexity. It is very practical: it improves upon the performance of all prior methods when no. setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z)
Distributed Saddle-Point Problems: Lower Bounds, Near-Optimal and Robust Algorithms [125.99533416395765]
This paper focuses on the distributed optimization of saddle point problems. The experimental part of the paper, we show the effectiveness of our distributed method in practice.
arXiv Detail & Related papers (2020-10-25T13:13:44Z)
ByGARS: Byzantine SGD with Arbitrary Number of Attackers [8.987350240289757]
We propose two novel computation algorithms for distributed machine learning in the presence of Byzantine adversaries. In these algorithms, reputation scores workers are computed using an auxiliary dataset at the server. We show that the proposed algorithms are robust to multiple different types of attacks at the same time.
arXiv Detail & Related papers (2020-06-24T01:47:13Z)
Distributed Newton Can Communicate Less and Resist Byzantine Workers [31.271927533726842]
We develop a distributed second order optimization algorithm that is communication-efficient as well as robust against Byzantine failures of the worker machines. We establish the linear-quadratic rate of convergence of COMRADE and establish that the communication savings and Byzantine resilience result in only a small statistical error rate for arbitrary convex loss functions.
arXiv Detail & Related papers (2020-06-15T20:16:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.