Related papers: Byzantine-Robust Variance-Reduced Federated Learning over Distributed Non-i.i.d. Data

Byzantine-Robust Variance-Reduced Federated Learning over Distributed Non-i.i.d. Data

URL: http://arxiv.org/abs/2009.08161v2
Date: Sat, 28 Aug 2021 11:00:06 GMT
Title: Byzantine-Robust Variance-Reduced Federated Learning over Distributed Non-i.i.d. Data
Authors: Jie Peng, Zhaoxian Wu, Qing Ling and Tianyi Chen
Abstract summary: We consider the federated learning problem where data on workers are not independent and identically distributed (i.i.d.) An unknown number of Byzantine workers may send malicious messages to the central node, leading to remarkable learning error. Most of the Byzantine-robust methods address this issue by using robust aggregation rules to aggregate the received messages.
Score: 36.99547890386817
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the federated learning problem where data on workers are not independent and identically distributed (i.i.d.). During the learning process, an unknown number of Byzantine workers may send malicious messages to the central node, leading to remarkable learning error. Most of the Byzantine-robust methods address this issue by using robust aggregation rules to aggregate the received messages, but rely on the assumption that all the regular workers have i.i.d. data, which is not the case in many federated learning applications. In light of the significance of reducing stochastic gradient noise for mitigating the effect of Byzantine attacks, we use a resampling strategy to reduce the impact of both inner variation (that describes the sample heterogeneity on every regular worker) and outer variation (that describes the sample heterogeneity among the regular workers), along with a stochastic average gradient algorithm to gradually eliminate the inner variation. The variance-reduced messages are then aggregated with a robust geometric median operator. We prove that the proposed method reaches a neighborhood of the optimal solution at a linear convergence rate and the learning error is determined by the number of Byzantine workers. Numerical experiments corroborate the theoretical results and show that the proposed method outperforms the state-of-the-arts in the non-i.i.d. setting.

Related papers

Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering [17.446431849022346]
Distributed learning has become the standard approach for training large-scale machine learning models across private data silos. It faces critical challenges related to robustness and communication preservation. We propose a novel Byzantine-robust and communication-efficient distributed learning method.
arXiv Detail & Related papers (2024-09-13T08:53:10Z)
Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data. Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z)
Byzantine Robustness and Partial Participation Can Be Achieved at Once: Just Clip Gradient Differences [61.74021364776313]
Distributed learning has emerged as a leading paradigm for training large machine learning models. In real-world scenarios, participants may be unreliable or malicious, posing a significant challenge to the integrity and accuracy of the trained models. We propose the first distributed method with client sampling and provable tolerance to Byzantine workers.
arXiv Detail & Related papers (2023-11-23T17:50:30Z)
Byzantine-Robust Loopless Stochastic Variance-Reduced Gradient [0.0]
We propose a new method -- Byzantine-Robust Loopless Variance Reduced Gradient (BR-LSVRG) We derive non-asymptotic convergence guarantees for the new method in the strongly convex case.
arXiv Detail & Related papers (2023-03-08T13:20:49Z)
Decentralized Local Stochastic Extra-Gradient for Variational Inequalities [125.62877849447729]
We consider distributed variational inequalities (VIs) on domains with the problem data that is heterogeneous (non-IID) and distributed across many devices. We make a very general assumption on the computational network that covers the settings of fully decentralized calculations. We theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone settings.
arXiv Detail & Related papers (2021-06-15T17:45:51Z)
Stochastic Alternating Direction Method of Multipliers for Byzantine-Robust Distributed Learning [22.835940007753376]
We propose a Byzantine-robust alternating direction method of multipliers (ADMM) that fully utilizes the separable problem structure. Theoretically, we prove that the proposed method converges to a bounded neighborhood of the optimal solution at a rate of O(k) under mild assumptions. Numerical experiments on the MNIST and COVERTYPE datasets demonstrate the effectiveness of the proposed method to various Byzantine attacks.
arXiv Detail & Related papers (2021-06-13T01:17:31Z)
BROADCAST: Reducing Both Stochastic and Compression Noise to Robustify Communication-Efficient Federated Learning [24.016538592246377]
Communication between workers and master node to collect local gradients is a key bottleneck in a large-scale learning system. In this work, we investigate the problem of Byzantine-robust federated learning with compression, where the attacks from Byzantine workers can be arbitrarily malicious. In light of this observation, we propose to jointly reduce the noise and compression noise so as to improve the Byzantine-robustness.
arXiv Detail & Related papers (2021-04-14T08:16:03Z)
Learning while Respecting Privacy and Robustness to Distributional Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model. The objective is to endow the trained model with robustness against adversarially manipulated input data. Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
Federated Variance-Reduced Stochastic Gradient Descent with Robustness to Byzantine Attacks [74.36161581953658]
This paper deals with distributed finite-sum optimization for learning over networks in the presence of malicious Byzantine attacks. To cope with such attacks, most resilient approaches so far combine gradient descent (SGD) with different robust aggregation rules. The present work puts forth a Byzantine attack resilient distributed (Byrd-) SAGA approach for learning tasks involving finite-sum optimization over networks.
arXiv Detail & Related papers (2019-12-29T19:46:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.