Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering
- URL: http://arxiv.org/abs/2409.08640v1
- Date: Fri, 13 Sep 2024 08:53:10 GMT
- Title: Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering
- Authors: Changxin Liu, Yanghao Li, Yuhao Yi, Karl H. Johansson,
- Abstract summary: Distributed learning has become the standard approach for training large-scale machine learning models across private data silos.
It faces critical challenges related to robustness and communication preservation.
We propose a novel Byzantine-robust and communication-efficient distributed learning method.
- Score: 17.446431849022346
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Distributed learning has become the standard approach for training large-scale machine learning models across private data silos. While distributed learning enhances privacy preservation and training efficiency, it faces critical challenges related to Byzantine robustness and communication reduction. Existing Byzantine-robust and communication-efficient methods rely on full gradient information either at every iteration or at certain iterations with a probability, and they only converge to an unnecessarily large neighborhood around the solution. Motivated by these issues, we propose a novel Byzantine-robust and communication-efficient stochastic distributed learning method that imposes no requirements on batch size and converges to a smaller neighborhood around the optimal solution than all existing methods, aligning with the theoretical lower bound. Our key innovation is leveraging Polyak Momentum to mitigate the noise caused by both biased compressors and stochastic gradients, thus defending against Byzantine workers under information compression. We provide proof of tight complexity bounds for our algorithm in the context of non-convex smooth loss functions, demonstrating that these bounds match the lower bounds in Byzantine-free scenarios. Finally, we validate the practical significance of our algorithm through an extensive series of experiments, benchmarking its performance on both binary classification and image classification tasks.
Related papers
- Accelerated Stochastic ExtraGradient: Mixing Hessian and Gradient Similarity to Reduce Communication in Distributed and Federated Learning [50.382793324572845]
Distributed computing involves communication between devices, which requires solving two key problems: efficiency and privacy.
In this paper, we analyze a new method that incorporates the ideas of using data similarity and clients sampling.
To address privacy concerns, we apply the technique of additional noise and analyze its impact on the convergence of the proposed method.
arXiv Detail & Related papers (2024-09-22T00:49:10Z) - Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks [94.2860766709971]
We address the challenge of sampling and remote estimation for autoregressive Markovian processes in a wireless network with statistically-identical agents.
Our goal is to minimize time-average estimation error and/or age of information with decentralized scalable sampling and transmission policies.
arXiv Detail & Related papers (2024-04-04T06:24:11Z) - Byzantine Robustness and Partial Participation Can Be Achieved at Once: Just Clip Gradient Differences [61.74021364776313]
Distributed learning has emerged as a leading paradigm for training large machine learning models.
In real-world scenarios, participants may be unreliable or malicious, posing a significant challenge to the integrity and accuracy of the trained models.
We propose the first distributed method with client sampling and provable tolerance to Byzantine workers.
arXiv Detail & Related papers (2023-11-23T17:50:30Z) - Byzantine-Resilient Federated Learning at Edge [20.742023657098525]
We present a Byzantine-resilient descent algorithm that can handle heavy-tailed data.
We also propose an algorithm that incorporates costs during the learning process.
arXiv Detail & Related papers (2023-03-18T15:14:16Z) - Byzantine-Robust Online and Offline Distributed Reinforcement Learning [60.970950468309056]
We consider a distributed reinforcement learning setting where multiple agents explore the environment and communicate their experiences through a central server.
$alpha$-fraction of agents are adversarial and can report arbitrary fake information.
We seek to identify a near-optimal policy for the underlying Markov decision process in the presence of these adversarial agents.
arXiv Detail & Related papers (2022-06-01T00:44:53Z) - Contextual Model Aggregation for Fast and Robust Federated Learning in
Edge Computing [88.76112371510999]
Federated learning is a prime candidate for distributed machine learning at the network edge.
Existing algorithms face issues with slow convergence and/or robustness of performance.
We propose a contextual aggregation scheme that achieves the optimal context-dependent bound on loss reduction.
arXiv Detail & Related papers (2022-03-23T21:42:31Z) - BROADCAST: Reducing Both Stochastic and Compression Noise to Robustify
Communication-Efficient Federated Learning [24.016538592246377]
Communication between workers and master node to collect local gradients is a key bottleneck in a large-scale learning system.
In this work, we investigate the problem of Byzantine-robust federated learning with compression, where the attacks from Byzantine workers can be arbitrarily malicious.
In light of this observation, we propose to jointly reduce the noise and compression noise so as to improve the Byzantine-robustness.
arXiv Detail & Related papers (2021-04-14T08:16:03Z) - Byzantine Resilient Distributed Multi-Task Learning [6.850757447639822]
We show that distributed algorithms for learning relatedness among tasks are not resilient in the presence of Byzantine agents.
We propose an approach for Byzantine resilient distributed multi-task learning.
arXiv Detail & Related papers (2020-10-25T04:32:52Z) - Byzantine-Robust Variance-Reduced Federated Learning over Distributed
Non-i.i.d. Data [36.99547890386817]
We consider the federated learning problem where data on workers are not independent and identically distributed (i.i.d.)
An unknown number of Byzantine workers may send malicious messages to the central node, leading to remarkable learning error.
Most of the Byzantine-robust methods address this issue by using robust aggregation rules to aggregate the received messages.
arXiv Detail & Related papers (2020-09-17T09:09:23Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.