Holdout SGD: Byzantine Tolerant Federated Learning
- URL: http://arxiv.org/abs/2008.04612v1
- Date: Tue, 11 Aug 2020 10:16:37 GMT
- Title: Holdout SGD: Byzantine Tolerant Federated Learning
- Authors: Shahar Azulay, Lior Raz, Amir Globerson, Tomer Koren, Yehuda Afek
- Abstract summary: This work presents a new distributed Byzantine tolerant federated learning algorithm, HoldOut SGD, for Gradient Descent (SGD) optimization.
HoldOut SGD uses the well known machine learning technique of holdout estimation, in a distributed fashion, in order to select parameter updates that are likely to lead to models with low loss values.
We provide formal guarantees for the HoldOut SGD process in terms of its convergence to the optimal model, and its level of resilience to the fraction of Byzantine workers.
- Score: 43.446891082719944
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work presents a new distributed Byzantine tolerant federated learning
algorithm, HoldOut SGD, for Stochastic Gradient Descent (SGD) optimization.
HoldOut SGD uses the well known machine learning technique of holdout
estimation, in a distributed fashion, in order to select parameter updates that
are likely to lead to models with low loss values. This makes it more effective
at discarding Byzantine workers inputs than existing methods that eliminate
outliers in the parameter-space of the learned model. HoldOut SGD first
randomly selects a set of workers that use their private data in order to
propose gradient updates. Next, a voting committee of workers is randomly
selected, and each voter uses its private data as holdout data, in order to
select the best proposals via a voting scheme. We propose two possible
mechanisms for the coordination of workers in the distributed computation of
HoldOut SGD. The first uses a truthful central server and corresponds to the
typical setting of current federated learning. The second is fully distributed
and requires no central server, paving the way to fully decentralized federated
learning. The fully distributed version implements HoldOut SGD via ideas from
the blockchain domain, and specifically the Algorand committee selection and
consensus processes. We provide formal guarantees for the HoldOut SGD process
in terms of its convergence to the optimal model, and its level of resilience
to the fraction of Byzantine workers. Empirical evaluation shows that HoldOut
SGD is Byzantine-resilient and efficiently converges to an effectual model for
deep-learning tasks, as long as the total number of participating workers is
large and the fraction of Byzantine workers is less than half (<1/3 for the
fully distributed variant).
Related papers
- A Federated Distributionally Robust Support Vector Machine with Mixture of Wasserstein Balls Ambiguity Set for Distributed Fault Diagnosis [3.662364375995991]
We study the problem of training a distributionally robust (DR) support vector machine (SVM) in a federated fashion over a network comprised of a central server and $G$ clients without sharing data.
We propose two distributed optimization algorithms for training the global FDR-SVM.
arXiv Detail & Related papers (2024-10-04T19:21:45Z) - Optimally Solving Simultaneous-Move Dec-POMDPs: The Sequential Central Planning Approach [0.0]
epsilon training for decentralized execution paradigm emerged as the state-of-the-art approach to epsilon-optimally solving decentralized partially observable decision processes.
This paper presents a novel and more scalable alternative, namely sequential-move centralized training for decentralized execution.
arXiv Detail & Related papers (2024-08-23T15:01:37Z) - SABLE: Secure And Byzantine robust LEarning [9.455980760111498]
Homomorphic encryption (HE) has emerged as a leading security measure to preserve privacy in distributed learning.
This paper introduces SABLE, the first homomorphic and Byzantine robust distributed learning algorithm.
arXiv Detail & Related papers (2023-09-11T11:54:42Z) - Beyond ADMM: A Unified Client-variance-reduced Adaptive Federated
Learning Framework [82.36466358313025]
We propose a primal-dual FL algorithm, termed FedVRA, that allows one to adaptively control the variance-reduction level and biasness of the global model.
Experiments based on (semi-supervised) image classification tasks demonstrate superiority of FedVRA over the existing schemes.
arXiv Detail & Related papers (2022-12-03T03:27:51Z) - An Expectation-Maximization Perspective on Federated Learning [75.67515842938299]
Federated learning describes the distributed training of models across multiple clients while keeping the data private on-device.
In this work, we view the server-orchestrated federated learning process as a hierarchical latent variable model where the server provides the parameters of a prior distribution over the client-specific model parameters.
We show that with simple Gaussian priors and a hard version of the well known Expectation-Maximization (EM) algorithm, learning in such a model corresponds to FedAvg, the most popular algorithm for the federated learning setting.
arXiv Detail & Related papers (2021-11-19T12:58:59Z) - Combining Differential Privacy and Byzantine Resilience in Distributed
SGD [9.14589517827682]
This paper studies the extent to which the distributed SGD algorithm, in the standard parameter-server architecture, can learn an accurate model.
We show that many existing results on the convergence of distributed SGD under Byzantine faults, especially those relying on $(alpha,f)$-Byzantine resilience, are rendered invalid when honest workers enforce DP.
arXiv Detail & Related papers (2021-10-08T09:23:03Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z) - A(DP)$^2$SGD: Asynchronous Decentralized Parallel Stochastic Gradient
Descent with Differential Privacy [15.038697541988746]
A popular distributed learning strategy is federated learning, where there is a central server storing the global model and a set of local computing nodes updating the model parameters with their corresponding data.
In this paper, we present a differentially private version of asynchronous decentralized parallel SGD framework, or A(DP)$2$SGD for short, which maintains communication efficiency of ADPSGD and prevents the inference from malicious participants.
arXiv Detail & Related papers (2020-08-21T00:56:22Z) - Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing [55.012801269326594]
In Byzantine robust distributed learning, a central server wants to train a machine learning model over data distributed across multiple workers.
A fraction of these workers may deviate from the prescribed algorithm and send arbitrary messages.
We propose a simple bucketing scheme that adapts existing robust algorithms to heterogeneous datasets at a negligible computational cost.
arXiv Detail & Related papers (2020-06-16T17:58:53Z) - Joint Parameter-and-Bandwidth Allocation for Improving the Efficiency of
Partitioned Edge Learning [73.82875010696849]
Machine learning algorithms are deployed at the network edge for training artificial intelligence (AI) models.
This paper focuses on the novel joint design of parameter (computation load) allocation and bandwidth allocation.
arXiv Detail & Related papers (2020-03-10T05:52:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.