Related papers: Byzantine Fault-Tolerance in Federated Local SGD under 2f-Redundancy

Byzantine Fault-Tolerance in Federated Local SGD under 2f-Redundancy

URL: http://arxiv.org/abs/2108.11769v1
Date: Thu, 26 Aug 2021 13:04:28 GMT
Title: Byzantine Fault-Tolerance in Federated Local SGD under 2f-Redundancy
Authors: Nirupam Gupta, Thinh T. Doan and Nitin Vaidya
Abstract summary: We consider the problem of Byzantine faulttolerance in federated machine learning. In fault-free setting, the agents collaborate with the coordinator to find a minimizer of the aggregate of their local cost functions. We show that, under $2f$-redundancy, the federated local algorithm with CE can indeed obtain exact fault-tolerance.
Score: 1.7188280334580197
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of Byzantine fault-tolerance in federated machine learning. In this problem, the system comprises multiple agents each with local data, and a trusted centralized coordinator. In fault-free setting, the agents collaborate with the coordinator to find a minimizer of the aggregate of their local cost functions defined over their local data. We consider a scenario where some agents ($f$ out of $N$) are Byzantine faulty. Such agents need not follow a prescribed algorithm correctly, and may communicate arbitrary incorrect information to the coordinator. In the presence of Byzantine agents, a more reasonable goal for the non-faulty agents is to find a minimizer of the aggregate cost function of only the non-faulty agents. This particular goal is commonly referred as exact fault-tolerance. Recent work has shown that exact fault-tolerance is achievable if only if the non-faulty agents satisfy the property of $2f$-redundancy. Now, under this property, techniques are known to impart exact fault-tolerance to the distributed implementation of the classical stochastic gradient-descent (SGD) algorithm. However, we do not know of any such techniques for the federated local SGD algorithm - a more commonly used method for federated machine learning. To address this issue, we propose a novel technique named comparative elimination (CE). We show that, under $2f$-redundancy, the federated local SGD algorithm with CE can indeed obtain exact fault-tolerance in the deterministic setting when the non-faulty agents can accurately compute gradients of their local cost functions. In the general stochastic case, when agents can only compute unbiased noisy estimates of their local gradients, our algorithm achieves approximate fault-tolerance with approximation error proportional to the variance of stochastic gradients and the fraction of Byzantine agents.

Related papers

Achieving Tighter Finite-Time Rates for Heterogeneous Federated Stochastic Approximation under Markovian Sampling [6.549288471493216]
We study a generic federated approximation problem involving $M$ agents. The goal is for the agents to communicate intermittently via a server to find the root of the average of the agents' local operators. We develop a novel algorithm titled texttFedHSA, and prove that it guarantees convergence to the correct point.
arXiv Detail & Related papers (2025-04-15T22:13:55Z)
Proportional Fairness in Non-Centroid Clustering [34.91134564352282]
We revisit the recently developed framework of proportionally fair clustering, where the goal is to provide group fairness guarantees. We expand the framework to non-centroid clustering, where the loss of an agent is a function of the other agents in its cluster. We show that the best approximation we are able to establish, using an adaptation of the GreedyCapture algorithm, is unappealing for a natural loss function.
arXiv Detail & Related papers (2024-10-30T17:53:49Z)
Compressed Regression over Adaptive Networks [58.79251288443156]
We derive the performance achievable by a network of distributed agents that solve, adaptively and in the presence of communication constraints, a regression problem. We devise an optimized allocation strategy where the parameters necessary for the optimization can be learned online by the agents.
arXiv Detail & Related papers (2023-04-07T13:41:08Z)
Impact of Redundancy on Resilience in Distributed Optimization and Learning [4.664766612388049]
This report considers the problem of resilient distributed optimization and learning in a server-based architecture. The system comprises a server and multiple agents, where each agent has its own local cost function. We show that $(f, r; epsilon)$-resilience can indeed be achieved in practice, given that the local cost functions are sufficiently redundant.
arXiv Detail & Related papers (2022-11-16T02:23:34Z)
$\texttt{FedBC}$: Calibrating Global and Local Models via Federated Learning Beyond Consensus [66.62731854746856]
In federated learning (FL), the objective of collaboratively learning a global model through aggregation of model updates across devices tends to oppose the goal of personalization via local information. In this work, we calibrate this tradeoff in a quantitative manner through a multi-criterion-based optimization. We demonstrate that $texttFedBC$ balances the global and local model test accuracy metrics across a suite datasets.
arXiv Detail & Related papers (2022-06-22T02:42:04Z)
Byzantine-Robust Online and Offline Distributed Reinforcement Learning [60.970950468309056]
We consider a distributed reinforcement learning setting where multiple agents explore the environment and communicate their experiences through a central server. $alpha$-fraction of agents are adversarial and can report arbitrary fake information. We seek to identify a near-optimal policy for the underlying Markov decision process in the presence of these adversarial agents.
arXiv Detail & Related papers (2022-06-01T00:44:53Z)
Utilizing Redundancy in Cost Functions for Resilience in Distributed Optimization and Learning [1.8414221462731502]
This paper considers the problem of resilient distributed optimization and machine learning in a server-based architecture. The system comprises a server and multiple agents, where each agent has a local cost function. We consider the case when some of the agents may be asynchronous and/or Byzantine faulty.
arXiv Detail & Related papers (2021-10-21T02:41:19Z)
Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which. machines can independently compute gradients, and cooperate. Our algorithm is based on a new concentration technique, and its sample complexity. It is very practical: it improves upon the performance of all prior methods when no. setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z)
Byzantine Fault-Tolerant Distributed Machine Learning Using Stochastic Gradient Descent (SGD) and Norm-Based Comparative Gradient Elimination (CGE) [2.582633087903328]
We consider the Byzantine fault-tolerance problem in distributed gradient descent (D-SGD) method. We propose a norm-based gradient-filter, named comparative gradient elimination (CGE) We show that the CGE gradient-filter guarantees fault-tolerance against a bounded fraction of Byzantine agents under standard assumptions.
arXiv Detail & Related papers (2020-08-11T13:51:16Z)
Communication-Efficient Robust Federated Learning Over Heterogeneous Datasets [147.11434031193164]
This work investigates fault-resilient federated learning when the data samples are non-uniformly distributed across workers. In the presence of adversarially faulty workers who may strategically corrupt datasets, the local messages exchanged can be unreliable. The present work introduces a fault-resilient gradient (FRPG) algorithm that relies on Nesterov's acceleration technique. For strongly convex loss functions, FRPG and LFRPG have provably faster convergence rates than a benchmark robust aggregation algorithm.
arXiv Detail & Related papers (2020-06-17T16:50:33Z)
Byzantine-Robust Decentralized Stochastic Optimization over Static and Time-Varying Networks [25.15075119957447]
We consider the Byzantine-robust optimization problem defined over decentralized static and time-varying networks. Some of the agents are unreliable due to data corruptions, equipment failures or cyber-attacks. Our key idea to handle the Byzantine attacks is to formulate a total variation (TV) norm-penalized approximation of the Byzantine-free problem. We prove that the proposed method reaches a neighborhood of the Byzantine-free optimal solution, and the size of neighborhood is determined by the number of Byzantine agents and the network topology.
arXiv Detail & Related papers (2020-05-12T04:18:39Z)
Uncertainty-Aware Consistency Regularization for Cross-Domain Semantic Segmentation [63.75774438196315]
Unsupervised domain adaptation (UDA) aims to adapt existing models of the source domain to a new target domain with only unlabeled data. Most existing methods suffer from noticeable negative transfer resulting from either the error-prone discriminator network or the unreasonable teacher model. We propose an uncertainty-aware consistency regularization method for cross-domain semantic segmentation.
arXiv Detail & Related papers (2020-04-19T15:30:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.