BROADCAST: Reducing Both Stochastic and Compression Noise to Robustify
Communication-Efficient Federated Learning
- URL: http://arxiv.org/abs/2104.06685v1
- Date: Wed, 14 Apr 2021 08:16:03 GMT
- Title: BROADCAST: Reducing Both Stochastic and Compression Noise to Robustify
Communication-Efficient Federated Learning
- Authors: Heng Zhu, Qing Ling
- Abstract summary: Communication between workers and master node to collect local gradients is a key bottleneck in a large-scale learning system.
In this work, we investigate the problem of Byzantine-robust federated learning with compression, where the attacks from Byzantine workers can be arbitrarily malicious.
In light of this observation, we propose to jointly reduce the noise and compression noise so as to improve the Byzantine-robustness.
- Score: 24.016538592246377
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Communication between workers and the master node to collect local stochastic
gradients is a key bottleneck in a large-scale federated learning system.
Various recent works have proposed to compress the local stochastic gradients
to mitigate the communication overhead. However, robustness to malicious
attacks is rarely considered in such a setting. In this work, we investigate
the problem of Byzantine-robust federated learning with compression, where the
attacks from Byzantine workers can be arbitrarily malicious. We point out that
a vanilla combination of compressed stochastic gradient descent (SGD) and
geometric median-based robust aggregation suffers from both stochastic and
compression noise in the presence of Byzantine attacks. In light of this
observation, we propose to jointly reduce the stochastic and compression noise
so as to improve the Byzantine-robustness. For the stochastic noise, we adopt
the stochastic average gradient algorithm (SAGA) to gradually eliminate the
inner variations of regular workers. For the compression noise, we apply the
gradient difference compression and achieve compression for free. We
theoretically prove that the proposed algorithm reaches a neighborhood of the
optimal solution at a linear convergence rate, and the asymptotic learning
error is in the same order as that of the state-of-the-art uncompressed method.
Finally, numerical experiments demonstrate effectiveness of the proposed
method.
Related papers
- Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering [17.446431849022346]
Distributed learning has become the standard approach for training large-scale machine learning models across private data silos.
It faces critical challenges related to robustness and communication preservation.
We propose a novel Byzantine-robust and communication-efficient distributed learning method.
arXiv Detail & Related papers (2024-09-13T08:53:10Z) - Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework.
Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z) - Byzantine-Robust Decentralized Stochastic Optimization with Stochastic
Gradient Noise-Independent Learning Error [25.15075119957447]
We study Byzantine-robust optimization over a decentralized network, where every agent periodically communicates with its neighbors to exchange local models, and then updates its own local model by gradient descent (SGD)
The performance of such a method is affected by an unknown number of Byzantine agents, which conduct adversarially during the optimization process.
arXiv Detail & Related papers (2023-08-10T02:14:23Z) - $z$-SignFedAvg: A Unified Stochastic Sign-based Compression for
Federated Learning [14.363110221372274]
Federated Learning (FL) is a promising privacy-preserving distributed learning paradigm.
FL suffers from high communication cost when training large-scale machine learning models.
We propose a novel noisy perturbation scheme with a general symmetric noise distribution for sign-based compression.
arXiv Detail & Related papers (2023-02-06T06:54:49Z) - Communication-Efficient Federated Learning via Quantized Compressed
Sensing [82.10695943017907]
The presented framework consists of gradient compression for wireless devices and gradient reconstruction for a parameter server.
Thanks to gradient sparsification and quantization, our strategy can achieve a higher compression ratio than one-bit gradient compression.
We demonstrate that the framework achieves almost identical performance with the case that performs no compression.
arXiv Detail & Related papers (2021-11-30T02:13:54Z) - Escaping Saddle Points in Distributed Newton's Method with Communication
efficiency and Byzantine Resilience [49.379254729853756]
We consider the problem of optimizing a non-regularized loss function (with saddle points) in a distributed framework in the presence of Byzantine machines.
We robustify the cubic-regularized Newton algorithm such that it avoids the saddle points and the fake local minimas efficiently.
We obtain theoretical guarantees for our proposed scheme under several approximate settings including (sub-sampled) and Hessians.
arXiv Detail & Related papers (2021-03-17T03:53:58Z) - Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which.
machines can independently compute gradients, and cooperate.
Our algorithm is based on a new concentration technique, and its sample complexity.
It is very practical: it improves upon the performance of all prior methods when no.
setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z) - Byzantine-Robust Variance-Reduced Federated Learning over Distributed
Non-i.i.d. Data [36.99547890386817]
We consider the federated learning problem where data on workers are not independent and identically distributed (i.i.d.)
An unknown number of Byzantine workers may send malicious messages to the central node, leading to remarkable learning error.
Most of the Byzantine-robust methods address this issue by using robust aggregation rules to aggregate the received messages.
arXiv Detail & Related papers (2020-09-17T09:09:23Z) - On Biased Compression for Distributed Learning [55.89300593805943]
We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings.
We propose several new biased compressors with promising theoretical guarantees and practical performance.
arXiv Detail & Related papers (2020-02-27T19:52:24Z) - Federated Variance-Reduced Stochastic Gradient Descent with Robustness
to Byzantine Attacks [74.36161581953658]
This paper deals with distributed finite-sum optimization for learning over networks in the presence of malicious Byzantine attacks.
To cope with such attacks, most resilient approaches so far combine gradient descent (SGD) with different robust aggregation rules.
The present work puts forth a Byzantine attack resilient distributed (Byrd-) SAGA approach for learning tasks involving finite-sum optimization over networks.
arXiv Detail & Related papers (2019-12-29T19:46:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.