Reconciling Communication Compression and Byzantine-Robustness in Distributed Learning
- URL: http://arxiv.org/abs/2508.17129v2
- Date: Fri, 31 Oct 2025 19:44:57 GMT
- Title: Reconciling Communication Compression and Byzantine-Robustness in Distributed Learning
- Authors: Diksha Gupta, Antonio Honsell, Chuan Xu, Nirupam Gupta, Giovanni Neglia,
- Abstract summary: Distributed learning enables scalable model training over decentralized data, but remains hindered by Byzantine faults and high communication costs.<n>Prior work has shown that naively combining communication compression with Byzantine-robust aggregation can severely weaken resilience to faulty nodes.<n>We introduce RoSDHB, a new algorithm that integrates classical Polyak momentum with a coordinated compression strategy.
- Score: 13.279813908142648
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Distributed learning enables scalable model training over decentralized data, but remains hindered by Byzantine faults and high communication costs. While both challenges have been studied extensively in isolation, their interplay has received limited attention. Prior work has shown that naively combining communication compression with Byzantine-robust aggregation can severely weaken resilience to faulty nodes. The current state-of-the-art, Byz-DASHA-PAGE, leverages a momentum-based variance reduction scheme to counteract the negative effect of compression noise on Byzantine robustness. In this work, we introduce RoSDHB, a new algorithm that integrates classical Polyak momentum with a coordinated compression strategy. Theoretically, RoSDHB matches the convergence guarantees of Byz-DASHA-PAGE under the standard $(G,B)$-gradient dissimilarity model, while relying on milder assumptions and requiring less memory and communication per client. Empirically, RoSDHB demonstrates stronger robustness while achieving substantial communication savings compared to Byz-DASHA-PAGE.
Related papers
- Improving $(α, f)$-Byzantine Resilience in Federated Learning via layerwise aggregation and cosine distance [7.8973037023478785]
Federated Learning (FL) is a potential solution to data privacy challenges in distributed machine learning.<n>FL systems remain vulnerable to Byzantine attacks, where malicious nodes contribute corrupted model updates.<n>This paper introduces Layerwise Cosine Aggregation, a novel aggregation scheme designed to enhance robustness of these rules in high-dimensional settings.
arXiv Detail & Related papers (2025-03-27T08:07:39Z) - FedRTS: Federated Robust Pruning via Combinatorial Thompson Sampling [11.913924455236652]
Federated Learning (FL) enables collaborative model training across distributed clients without data sharing.<n>Current methods use dynamic pruning to improve efficiency by periodically adjusting sparse model topologies while maintaining sparsity.<n>We propose Federated Robust pruning via Thompson Sampling (FedRTS), a novel framework designed to develop robust sparse models.
arXiv Detail & Related papers (2025-01-31T13:26:22Z) - Towards Resource-Efficient Federated Learning in Industrial IoT for Multivariate Time Series Analysis [50.18156030818883]
Anomaly and missing data constitute a thorny problem in industrial applications.
Deep learning enabled anomaly detection has emerged as a critical direction.
The data collected in edge devices contain user privacy.
arXiv Detail & Related papers (2024-11-06T15:38:31Z) - Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering [17.446431849022346]
Distributed learning has become the standard approach for training large-scale machine learning models across private data silos.
It faces critical challenges related to robustness and communication preservation.
We propose a novel Byzantine-robust and communication-efficient distributed learning method.
arXiv Detail & Related papers (2024-09-13T08:53:10Z) - Differential error feedback for communication-efficient decentralized learning [48.924131251745266]
We propose a new decentralized communication-efficient learning approach that blends differential quantization with error feedback.
We show that the resulting communication-efficient strategy is stable both in terms of mean-square error and average bit rate.
The results establish that, in the small step-size regime and with a finite number of bits, it is possible to attain the performance achievable in the absence of compression.
arXiv Detail & Related papers (2024-06-26T15:11:26Z) - The Risk of Federated Learning to Skew Fine-Tuning Features and
Underperform Out-of-Distribution Robustness [50.52507648690234]
Federated learning has the risk of skewing fine-tuning features and compromising the robustness of the model.
We introduce three robustness indicators and conduct experiments across diverse robust datasets.
Our approach markedly enhances the robustness across diverse scenarios, encompassing various parameter-efficient fine-tuning methods.
arXiv Detail & Related papers (2024-01-25T09:18:51Z) - Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - Communication Compression for Byzantine Robust Learning: New Efficient
Algorithms and Improved Rates [9.965047642855074]
Byzantine robustness is an essential feature of algorithms for certain distributed optimization problems.
We propose a new Byzantine-robust compression method with better convergence rate.
We also develop the first Byzantine-robust method with communication compression error feedback.
arXiv Detail & Related papers (2023-10-15T11:22:34Z) - Enhancing Contrastive Learning with Noise-Guided Attack: Towards
Continual Relation Extraction in the Wild [57.468184469589744]
We develop a noise-resistant contrastive framework named as textbfNoise-guided textbfattack in textbfContrative textbfLearning(NaCL)
Compared to direct noise discarding or inaccessible noise relabeling, we present modifying the feature space to match the given noisy labels via attacking.
arXiv Detail & Related papers (2023-05-11T18:48:18Z) - Compressed Regression over Adaptive Networks [58.79251288443156]
We derive the performance achievable by a network of distributed agents that solve, adaptively and in the presence of communication constraints, a regression problem.
We devise an optimized allocation strategy where the parameters necessary for the optimization can be learned online by the agents.
arXiv Detail & Related papers (2023-04-07T13:41:08Z) - Variance Reduction is an Antidote to Byzantines: Better Rates, Weaker
Assumptions and Communication Compression as a Cherry on the Top [10.579228752210168]
Byzantine-robustness has gained a lot of attention due to the interest in collaborative andtolerant learning.
Byz-VRMARINA is a new Byzantine approach to robustness and communication compression.
Unlike Byzantine-robust methods with gradients, our results are tight and do not rely on restrictive assumptions such as boundedness or limited compression.
arXiv Detail & Related papers (2022-06-01T14:40:29Z) - Bridging Differential Privacy and Byzantine-Robustness via Model
Aggregation [27.518542543750367]
This paper aims at addressing conflicting issues in federated learning: differential privacy and Byzantinerobustness.
Standard mechanisms add transmitted DP, envelops entangles with robust gradient aggregation to defend against Byzantine attacks.
We show that the influence of our proposed mechanisms is deperturbed with that robust model aggregation.
arXiv Detail & Related papers (2022-04-29T23:37:46Z) - What do Compressed Large Language Models Forget? Robustness Challenges
in Model Compression [68.82486784654817]
We study two popular model compression techniques including knowledge distillation and pruning.
We show that compressed models are significantly less robust than their PLM counterparts on adversarial test sets.
We develop a regularization strategy for model compression based on sample uncertainty.
arXiv Detail & Related papers (2021-10-16T00:20:04Z) - BROADCAST: Reducing Both Stochastic and Compression Noise to Robustify
Communication-Efficient Federated Learning [24.016538592246377]
Communication between workers and master node to collect local gradients is a key bottleneck in a large-scale learning system.
In this work, we investigate the problem of Byzantine-robust federated learning with compression, where the attacks from Byzantine workers can be arbitrarily malicious.
In light of this observation, we propose to jointly reduce the noise and compression noise so as to improve the Byzantine-robustness.
arXiv Detail & Related papers (2021-04-14T08:16:03Z) - Federated Variance-Reduced Stochastic Gradient Descent with Robustness
to Byzantine Attacks [74.36161581953658]
This paper deals with distributed finite-sum optimization for learning over networks in the presence of malicious Byzantine attacks.
To cope with such attacks, most resilient approaches so far combine gradient descent (SGD) with different robust aggregation rules.
The present work puts forth a Byzantine attack resilient distributed (Byrd-) SAGA approach for learning tasks involving finite-sum optimization over networks.
arXiv Detail & Related papers (2019-12-29T19:46:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.