Related papers: BROADCAST: Reducing Both Stochastic and Compression Noise to Robustify Communication-Efficient Federated Learning

BROADCAST: Reducing Both Stochastic and Compression Noise to Robustify Communication-Efficient Federated Learning

URL: http://arxiv.org/abs/2104.06685v1
Date: Wed, 14 Apr 2021 08:16:03 GMT
Title: BROADCAST: Reducing Both Stochastic and Compression Noise to Robustify Communication-Efficient Federated Learning
Authors: Heng Zhu, Qing Ling
Abstract summary: Communication between workers and master node to collect local gradients is a key bottleneck in a large-scale learning system. In this work, we investigate the problem of Byzantine-robust federated learning with compression, where the attacks from Byzantine workers can be arbitrarily malicious. In light of this observation, we propose to jointly reduce the noise and compression noise so as to improve the Byzantine-robustness.
Score: 24.016538592246377
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Communication between workers and the master node to collect local stochastic gradients is a key bottleneck in a large-scale federated learning system. Various recent works have proposed to compress the local stochastic gradients to mitigate the communication overhead. However, robustness to malicious attacks is rarely considered in such a setting. In this work, we investigate the problem of Byzantine-robust federated learning with compression, where the attacks from Byzantine workers can be arbitrarily malicious. We point out that a vanilla combination of compressed stochastic gradient descent (SGD) and geometric median-based robust aggregation suffers from both stochastic and compression noise in the presence of Byzantine attacks. In light of this observation, we propose to jointly reduce the stochastic and compression noise so as to improve the Byzantine-robustness. For the stochastic noise, we adopt the stochastic average gradient algorithm (SAGA) to gradually eliminate the inner variations of regular workers. For the compression noise, we apply the gradient difference compression and achieve compression for free. We theoretically prove that the proposed algorithm reaches a neighborhood of the optimal solution at a linear convergence rate, and the asymptotic learning error is in the same order as that of the state-of-the-art uncompressed method. Finally, numerical experiments demonstrate effectiveness of the proposed method.

Related papers

Greedy Low-Rank Gradient Compression for Distributed Learning with Convergence Guarantees [13.806112971330482]
We propose the first Greedy Low-Rank compression algorithm for distributed learning with rigorous convergence guarantees.<n>We prove that GreedyLore achieves a convergence rate of $mathcalO(sigma/sqrtNT + 1/T)$ under standards such as MSGD and Adam--marking the first linear speedup convergence rate for low-rank gradient compression.
arXiv Detail & Related papers (2025-07-11T17:46:12Z)
Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering [17.446431849022346]
Distributed learning has become the standard approach for training large-scale machine learning models across private data silos. It faces critical challenges related to robustness and communication preservation. We propose a novel Byzantine-robust and communication-efficient distributed learning method.
arXiv Detail & Related papers (2024-09-13T08:53:10Z)
Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework. Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z)
Byzantine-Robust Decentralized Stochastic Optimization with Stochastic Gradient Noise-Independent Learning Error [25.15075119957447]
We study Byzantine-robust optimization over a decentralized network, where every agent periodically communicates with its neighbors to exchange local models, and then updates its own local model by gradient descent (SGD) The performance of such a method is affected by an unknown number of Byzantine agents, which conduct adversarially during the optimization process.
arXiv Detail & Related papers (2023-08-10T02:14:23Z)
$z$-SignFedAvg: A Unified Stochastic Sign-based Compression for Federated Learning [14.363110221372274]
Federated Learning (FL) is a promising privacy-preserving distributed learning paradigm. FL suffers from high communication cost when training large-scale machine learning models. We propose a novel noisy perturbation scheme with a general symmetric noise distribution for sign-based compression.
arXiv Detail & Related papers (2023-02-06T06:54:49Z)
Communication-Efficient Federated Learning via Quantized Compressed Sensing [82.10695943017907]
The presented framework consists of gradient compression for wireless devices and gradient reconstruction for a parameter server. Thanks to gradient sparsification and quantization, our strategy can achieve a higher compression ratio than one-bit gradient compression. We demonstrate that the framework achieves almost identical performance with the case that performs no compression.
arXiv Detail & Related papers (2021-11-30T02:13:54Z)
Escaping Saddle Points in Distributed Newton's Method with Communication efficiency and Byzantine Resilience [49.379254729853756]
We consider the problem of optimizing a non-regularized loss function (with saddle points) in a distributed framework in the presence of Byzantine machines. We robustify the cubic-regularized Newton algorithm such that it avoids the saddle points and the fake local minimas efficiently. We obtain theoretical guarantees for our proposed scheme under several approximate settings including (sub-sampled) and Hessians.
arXiv Detail & Related papers (2021-03-17T03:53:58Z)
Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which. machines can independently compute gradients, and cooperate. Our algorithm is based on a new concentration technique, and its sample complexity. It is very practical: it improves upon the performance of all prior methods when no. setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z)
Byzantine-Robust Variance-Reduced Federated Learning over Distributed Non-i.i.d. Data [36.99547890386817]
We consider the federated learning problem where data on workers are not independent and identically distributed (i.i.d.) An unknown number of Byzantine workers may send malicious messages to the central node, leading to remarkable learning error. Most of the Byzantine-robust methods address this issue by using robust aggregation rules to aggregate the received messages.
arXiv Detail & Related papers (2020-09-17T09:09:23Z)
On Biased Compression for Distributed Learning [55.89300593805943]
We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings. We propose several new biased compressors with promising theoretical guarantees and practical performance.
arXiv Detail & Related papers (2020-02-27T19:52:24Z)
Federated Variance-Reduced Stochastic Gradient Descent with Robustness to Byzantine Attacks [74.36161581953658]
This paper deals with distributed finite-sum optimization for learning over networks in the presence of malicious Byzantine attacks. To cope with such attacks, most resilient approaches so far combine gradient descent (SGD) with different robust aggregation rules. The present work puts forth a Byzantine attack resilient distributed (Byrd-) SAGA approach for learning tasks involving finite-sum optimization over networks.
arXiv Detail & Related papers (2019-12-29T19:46:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.