Byzantine-Robust Variance-Reduced Federated Learning over Distributed
Non-i.i.d. Data
- URL: http://arxiv.org/abs/2009.08161v2
- Date: Sat, 28 Aug 2021 11:00:06 GMT
- Title: Byzantine-Robust Variance-Reduced Federated Learning over Distributed
Non-i.i.d. Data
- Authors: Jie Peng, Zhaoxian Wu, Qing Ling and Tianyi Chen
- Abstract summary: We consider the federated learning problem where data on workers are not independent and identically distributed (i.i.d.)
An unknown number of Byzantine workers may send malicious messages to the central node, leading to remarkable learning error.
Most of the Byzantine-robust methods address this issue by using robust aggregation rules to aggregate the received messages.
- Score: 36.99547890386817
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the federated learning problem where data on workers are not
independent and identically distributed (i.i.d.). During the learning process,
an unknown number of Byzantine workers may send malicious messages to the
central node, leading to remarkable learning error. Most of the
Byzantine-robust methods address this issue by using robust aggregation rules
to aggregate the received messages, but rely on the assumption that all the
regular workers have i.i.d. data, which is not the case in many federated
learning applications. In light of the significance of reducing stochastic
gradient noise for mitigating the effect of Byzantine attacks, we use a
resampling strategy to reduce the impact of both inner variation (that
describes the sample heterogeneity on every regular worker) and outer variation
(that describes the sample heterogeneity among the regular workers), along with
a stochastic average gradient algorithm to gradually eliminate the inner
variation. The variance-reduced messages are then aggregated with a robust
geometric median operator. We prove that the proposed method reaches a
neighborhood of the optimal solution at a linear convergence rate and the
learning error is determined by the number of Byzantine workers. Numerical
experiments corroborate the theoretical results and show that the proposed
method outperforms the state-of-the-arts in the non-i.i.d. setting.
Related papers
- Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering [17.446431849022346]
Distributed learning has become the standard approach for training large-scale machine learning models across private data silos.
It faces critical challenges related to robustness and communication preservation.
We propose a novel Byzantine-robust and communication-efficient distributed learning method.
arXiv Detail & Related papers (2024-09-13T08:53:10Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Byzantine Robustness and Partial Participation Can Be Achieved at Once: Just Clip Gradient Differences [61.74021364776313]
Distributed learning has emerged as a leading paradigm for training large machine learning models.
In real-world scenarios, participants may be unreliable or malicious, posing a significant challenge to the integrity and accuracy of the trained models.
We propose the first distributed method with client sampling and provable tolerance to Byzantine workers.
arXiv Detail & Related papers (2023-11-23T17:50:30Z) - Byzantine-Robust Loopless Stochastic Variance-Reduced Gradient [0.0]
We propose a new method -- Byzantine-Robust Loopless Variance Reduced Gradient (BR-LSVRG)
We derive non-asymptotic convergence guarantees for the new method in the strongly convex case.
arXiv Detail & Related papers (2023-03-08T13:20:49Z) - Decentralized Local Stochastic Extra-Gradient for Variational
Inequalities [125.62877849447729]
We consider distributed variational inequalities (VIs) on domains with the problem data that is heterogeneous (non-IID) and distributed across many devices.
We make a very general assumption on the computational network that covers the settings of fully decentralized calculations.
We theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone settings.
arXiv Detail & Related papers (2021-06-15T17:45:51Z) - Stochastic Alternating Direction Method of Multipliers for
Byzantine-Robust Distributed Learning [22.835940007753376]
We propose a Byzantine-robust alternating direction method of multipliers (ADMM) that fully utilizes the separable problem structure.
Theoretically, we prove that the proposed method converges to a bounded neighborhood of the optimal solution at a rate of O(k) under mild assumptions.
Numerical experiments on the MNIST and COVERTYPE datasets demonstrate the effectiveness of the proposed method to various Byzantine attacks.
arXiv Detail & Related papers (2021-06-13T01:17:31Z) - BROADCAST: Reducing Both Stochastic and Compression Noise to Robustify
Communication-Efficient Federated Learning [24.016538592246377]
Communication between workers and master node to collect local gradients is a key bottleneck in a large-scale learning system.
In this work, we investigate the problem of Byzantine-robust federated learning with compression, where the attacks from Byzantine workers can be arbitrarily malicious.
In light of this observation, we propose to jointly reduce the noise and compression noise so as to improve the Byzantine-robustness.
arXiv Detail & Related papers (2021-04-14T08:16:03Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - Federated Variance-Reduced Stochastic Gradient Descent with Robustness
to Byzantine Attacks [74.36161581953658]
This paper deals with distributed finite-sum optimization for learning over networks in the presence of malicious Byzantine attacks.
To cope with such attacks, most resilient approaches so far combine gradient descent (SGD) with different robust aggregation rules.
The present work puts forth a Byzantine attack resilient distributed (Byrd-) SAGA approach for learning tasks involving finite-sum optimization over networks.
arXiv Detail & Related papers (2019-12-29T19:46:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.