Byzantine Fault-Tolerance in Federated Local SGD under 2f-Redundancy
- URL: http://arxiv.org/abs/2108.11769v1
- Date: Thu, 26 Aug 2021 13:04:28 GMT
- Title: Byzantine Fault-Tolerance in Federated Local SGD under 2f-Redundancy
- Authors: Nirupam Gupta, Thinh T. Doan and Nitin Vaidya
- Abstract summary: We consider the problem of Byzantine faulttolerance in federated machine learning.
In fault-free setting, the agents collaborate with the coordinator to find a minimizer of the aggregate of their local cost functions.
We show that, under $2f$-redundancy, the federated local algorithm with CE can indeed obtain exact fault-tolerance.
- Score: 1.7188280334580197
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of Byzantine fault-tolerance in federated machine
learning. In this problem, the system comprises multiple agents each with local
data, and a trusted centralized coordinator. In fault-free setting, the agents
collaborate with the coordinator to find a minimizer of the aggregate of their
local cost functions defined over their local data. We consider a scenario
where some agents ($f$ out of $N$) are Byzantine faulty. Such agents need not
follow a prescribed algorithm correctly, and may communicate arbitrary
incorrect information to the coordinator. In the presence of Byzantine agents,
a more reasonable goal for the non-faulty agents is to find a minimizer of the
aggregate cost function of only the non-faulty agents. This particular goal is
commonly referred as exact fault-tolerance. Recent work has shown that exact
fault-tolerance is achievable if only if the non-faulty agents satisfy the
property of $2f$-redundancy. Now, under this property, techniques are known to
impart exact fault-tolerance to the distributed implementation of the classical
stochastic gradient-descent (SGD) algorithm. However, we do not know of any
such techniques for the federated local SGD algorithm - a more commonly used
method for federated machine learning. To address this issue, we propose a
novel technique named comparative elimination (CE). We show that, under
$2f$-redundancy, the federated local SGD algorithm with CE can indeed obtain
exact fault-tolerance in the deterministic setting when the non-faulty agents
can accurately compute gradients of their local cost functions. In the general
stochastic case, when agents can only compute unbiased noisy estimates of their
local gradients, our algorithm achieves approximate fault-tolerance with
approximation error proportional to the variance of stochastic gradients and
the fraction of Byzantine agents.
Related papers
- Proportional Fairness in Non-Centroid Clustering [34.91134564352282]
We revisit the recently developed framework of proportionally fair clustering, where the goal is to provide group fairness guarantees.
We expand the framework to non-centroid clustering, where the loss of an agent is a function of the other agents in its cluster.
We show that the best approximation we are able to establish, using an adaptation of the GreedyCapture algorithm, is unappealing for a natural loss function.
arXiv Detail & Related papers (2024-10-30T17:53:49Z) - Compressed Regression over Adaptive Networks [58.79251288443156]
We derive the performance achievable by a network of distributed agents that solve, adaptively and in the presence of communication constraints, a regression problem.
We devise an optimized allocation strategy where the parameters necessary for the optimization can be learned online by the agents.
arXiv Detail & Related papers (2023-04-07T13:41:08Z) - Impact of Redundancy on Resilience in Distributed Optimization and
Learning [4.664766612388049]
This report considers the problem of resilient distributed optimization and learning in a server-based architecture.
The system comprises a server and multiple agents, where each agent has its own local cost function.
We show that $(f, r; epsilon)$-resilience can indeed be achieved in practice, given that the local cost functions are sufficiently redundant.
arXiv Detail & Related papers (2022-11-16T02:23:34Z) - $\texttt{FedBC}$: Calibrating Global and Local Models via Federated
Learning Beyond Consensus [66.62731854746856]
In federated learning (FL), the objective of collaboratively learning a global model through aggregation of model updates across devices tends to oppose the goal of personalization via local information.
In this work, we calibrate this tradeoff in a quantitative manner through a multi-criterion-based optimization.
We demonstrate that $texttFedBC$ balances the global and local model test accuracy metrics across a suite datasets.
arXiv Detail & Related papers (2022-06-22T02:42:04Z) - Byzantine-Robust Online and Offline Distributed Reinforcement Learning [60.970950468309056]
We consider a distributed reinforcement learning setting where multiple agents explore the environment and communicate their experiences through a central server.
$alpha$-fraction of agents are adversarial and can report arbitrary fake information.
We seek to identify a near-optimal policy for the underlying Markov decision process in the presence of these adversarial agents.
arXiv Detail & Related papers (2022-06-01T00:44:53Z) - Utilizing Redundancy in Cost Functions for Resilience in Distributed
Optimization and Learning [1.8414221462731502]
This paper considers the problem of resilient distributed optimization and machine learning in a server-based architecture.
The system comprises a server and multiple agents, where each agent has a local cost function.
We consider the case when some of the agents may be asynchronous and/or Byzantine faulty.
arXiv Detail & Related papers (2021-10-21T02:41:19Z) - Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which.
machines can independently compute gradients, and cooperate.
Our algorithm is based on a new concentration technique, and its sample complexity.
It is very practical: it improves upon the performance of all prior methods when no.
setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z) - Byzantine Fault-Tolerant Distributed Machine Learning Using Stochastic
Gradient Descent (SGD) and Norm-Based Comparative Gradient Elimination (CGE) [2.582633087903328]
We consider the Byzantine fault-tolerance problem in distributed gradient descent (D-SGD) method.
We propose a norm-based gradient-filter, named comparative gradient elimination (CGE)
We show that the CGE gradient-filter guarantees fault-tolerance against a bounded fraction of Byzantine agents under standard assumptions.
arXiv Detail & Related papers (2020-08-11T13:51:16Z) - Communication-Efficient Robust Federated Learning Over Heterogeneous
Datasets [147.11434031193164]
This work investigates fault-resilient federated learning when the data samples are non-uniformly distributed across workers.
In the presence of adversarially faulty workers who may strategically corrupt datasets, the local messages exchanged can be unreliable.
The present work introduces a fault-resilient gradient (FRPG) algorithm that relies on Nesterov's acceleration technique.
For strongly convex loss functions, FRPG and LFRPG have provably faster convergence rates than a benchmark robust aggregation algorithm.
arXiv Detail & Related papers (2020-06-17T16:50:33Z) - Byzantine-Robust Decentralized Stochastic Optimization over Static and
Time-Varying Networks [25.15075119957447]
We consider the Byzantine-robust optimization problem defined over decentralized static and time-varying networks.
Some of the agents are unreliable due to data corruptions, equipment failures or cyber-attacks.
Our key idea to handle the Byzantine attacks is to formulate a total variation (TV) norm-penalized approximation of the Byzantine-free problem.
We prove that the proposed method reaches a neighborhood of the Byzantine-free optimal solution, and the size of neighborhood is determined by the number of Byzantine agents and the network topology.
arXiv Detail & Related papers (2020-05-12T04:18:39Z) - Uncertainty-Aware Consistency Regularization for Cross-Domain Semantic
Segmentation [63.75774438196315]
Unsupervised domain adaptation (UDA) aims to adapt existing models of the source domain to a new target domain with only unlabeled data.
Most existing methods suffer from noticeable negative transfer resulting from either the error-prone discriminator network or the unreasonable teacher model.
We propose an uncertainty-aware consistency regularization method for cross-domain semantic segmentation.
arXiv Detail & Related papers (2020-04-19T15:30:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.