Related papers: Byzantine-Robust Loopless Stochastic Variance-Reduced Gradient

Byzantine-Robust Loopless Stochastic Variance-Reduced Gradient

URL: http://arxiv.org/abs/2303.04560v1
Date: Wed, 8 Mar 2023 13:20:49 GMT
Title: Byzantine-Robust Loopless Stochastic Variance-Reduced Gradient
Authors: Nikita Fedin, Eduard Gorbunov
Abstract summary: We propose a new method -- Byzantine-Robust Loopless Variance Reduced Gradient (BR-LSVRG) We derive non-asymptotic convergence guarantees for the new method in the strongly convex case.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Distributed optimization with open collaboration is a popular field since it provides an opportunity for small groups/companies/universities, and individuals to jointly solve huge-scale problems. However, standard optimization algorithms are fragile in such settings due to the possible presence of so-called Byzantine workers -- participants that can send (intentionally or not) incorrect information instead of the one prescribed by the protocol (e.g., send anti-gradient instead of stochastic gradients). Thus, the problem of designing distributed methods with provable robustness to Byzantine workers has been receiving a lot of attention recently. In particular, several works consider a very promising way to achieve Byzantine tolerance via exploiting variance reduction and robust aggregation. The existing approaches use SAGA- and SARAH-type variance-reduced estimators, while another popular estimator -- SVRG -- is not studied in the context of Byzantine-robustness. In this work, we close this gap in the literature and propose a new method -- Byzantine-Robust Loopless Stochastic Variance Reduced Gradient (BR-LSVRG). We derive non-asymptotic convergence guarantees for the new method in the strongly convex case and compare its performance with existing approaches in numerical experiments.

Related papers

From Inexact Gradients to Byzantine Robustness: Acceleration and Optimization under Similarity [12.097833603814252]
We show that Byzantine-robust distributed optimization can be cast as a general optimization with inexact gradient oracles.<n>We propose two optimization schemes to speed up the convergence.
arXiv Detail & Related papers (2026-02-03T09:56:23Z)
Optimal Complexity in Byzantine-Robust Distributed Stochastic Optimization with Data Heterogeneity [37.14123597310607]
In this paper, we establish tight lower bounds for Byzantine-robust distributed first-order optimization methods. To fill the gap, we leverage the techniques of Nesterov's acceleration reduction to develop novel Byzantine-robust distributed optimization methods.
arXiv Detail & Related papers (2025-03-20T16:56:06Z)
Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering [17.446431849022346]
Distributed learning has become the standard approach for training large-scale machine learning models across private data silos. It faces critical challenges related to robustness and communication preservation. We propose a novel Byzantine-robust and communication-efficient distributed learning method.
arXiv Detail & Related papers (2024-09-13T08:53:10Z)
High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise [96.80184504268593]
gradient, clipping is one of the key algorithmic ingredients to derive good high-probability guarantees. Clipping can spoil the convergence of the popular methods for composite and distributed optimization.
arXiv Detail & Related papers (2023-10-03T07:49:17Z)
Model-Based Uncertainty in Value Functions [89.31922008981735]
We focus on characterizing the variance over values induced by a distribution over MDPs. Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation. We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values.
arXiv Detail & Related papers (2023-02-24T09:18:27Z)
Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods [73.35353358543507]
Gradient Descent-Ascent (SGDA) is one of the most prominent algorithms for solving min-max optimization and variational inequalities problems (VIP) In this paper, we propose a unified convergence analysis that covers a large variety of descent-ascent methods. We develop several new variants of SGDA such as a new variance-reduced method (L-SVRGDA), new distributed methods with compression (QSGDA, DIANA-SGDA, VR-DIANA-SGDA), and a new method with coordinate randomization (SEGA-SGDA)
arXiv Detail & Related papers (2022-02-15T09:17:39Z)
Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space. We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z)
Stochastic Alternating Direction Method of Multipliers for Byzantine-Robust Distributed Learning [22.835940007753376]
We propose a Byzantine-robust alternating direction method of multipliers (ADMM) that fully utilizes the separable problem structure. Theoretically, we prove that the proposed method converges to a bounded neighborhood of the optimal solution at a rate of O(k) under mild assumptions. Numerical experiments on the MNIST and COVERTYPE datasets demonstrate the effectiveness of the proposed method to various Byzantine attacks.
arXiv Detail & Related papers (2021-06-13T01:17:31Z)
Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which. machines can independently compute gradients, and cooperate. Our algorithm is based on a new concentration technique, and its sample complexity. It is very practical: it improves upon the performance of all prior methods when no. setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z)
Byzantine-Robust Variance-Reduced Federated Learning over Distributed Non-i.i.d. Data [36.99547890386817]
We consider the federated learning problem where data on workers are not independent and identically distributed (i.i.d.) An unknown number of Byzantine workers may send malicious messages to the central node, leading to remarkable learning error. Most of the Byzantine-robust methods address this issue by using robust aggregation rules to aggregate the received messages.
arXiv Detail & Related papers (2020-09-17T09:09:23Z)
Byzantine-Robust Decentralized Stochastic Optimization over Static and Time-Varying Networks [25.15075119957447]
We consider the Byzantine-robust optimization problem defined over decentralized static and time-varying networks. Some of the agents are unreliable due to data corruptions, equipment failures or cyber-attacks. Our key idea to handle the Byzantine attacks is to formulate a total variation (TV) norm-penalized approximation of the Byzantine-free problem. We prove that the proposed method reaches a neighborhood of the Byzantine-free optimal solution, and the size of neighborhood is determined by the number of Byzantine agents and the network topology.
arXiv Detail & Related papers (2020-05-12T04:18:39Z)
Federated Variance-Reduced Stochastic Gradient Descent with Robustness to Byzantine Attacks [74.36161581953658]
This paper deals with distributed finite-sum optimization for learning over networks in the presence of malicious Byzantine attacks. To cope with such attacks, most resilient approaches so far combine gradient descent (SGD) with different robust aggregation rules. The present work puts forth a Byzantine attack resilient distributed (Byrd-) SAGA approach for learning tasks involving finite-sum optimization over networks.
arXiv Detail & Related papers (2019-12-29T19:46:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.