Related papers: Weight for Robustness: A Comprehensive Approach towards Optimal Fault-Tolerant Asynchronous ML

Weight for Robustness: A Comprehensive Approach towards Optimal Fault-Tolerant Asynchronous ML

URL: http://arxiv.org/abs/2501.09621v1
Date: Thu, 16 Jan 2025 16:00:52 GMT
Title: Weight for Robustness: A Comprehensive Approach towards Optimal Fault-Tolerant Asynchronous ML
Authors: Tehila Dahan, Kfir Y. Levy,
Abstract summary: Asynchronous systems struggle with maintaining integrity against Byzantine failures.<n>We introduce a novel weighted robust aggregation framework to tackle these issues.<n>We achieve an optimal convergence rate for the first time in an asynchronous Byzantine environment.
Score: 8.419845742978985
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We address the challenges of Byzantine-robust training in asynchronous distributed machine learning systems, aiming to enhance efficiency amid massive parallelization and heterogeneous computing resources. Asynchronous systems, marked by independently operating workers and intermittent updates, uniquely struggle with maintaining integrity against Byzantine failures, which encompass malicious or erroneous actions that disrupt learning. The inherent delays in such settings not only introduce additional bias to the system but also obscure the disruptions caused by Byzantine faults. To tackle these issues, we adapt the Byzantine framework to asynchronous dynamics by introducing a novel weighted robust aggregation framework. This allows for the extension of robust aggregators and a recent meta-aggregator to their weighted versions, mitigating the effects of delayed updates. By further incorporating a recent variance-reduction technique, we achieve an optimal convergence rate for the first time in an asynchronous Byzantine environment. Our methodology is rigorously validated through empirical and theoretical analysis, demonstrating its effectiveness in enhancing fault tolerance and optimizing performance in asynchronous ML systems.

Related papers

Efficient Federated Learning with Timely Update Dissemination [54.668309196009204]
Federated Learning (FL) has emerged as a compelling methodology for the management of distributed data.<n>We propose an efficient FL approach that capitalizes on additional downlink bandwidth resources to ensure timely update dissemination.
arXiv Detail & Related papers (2025-07-08T14:34:32Z)
Adaptive Deadline and Batch Layered Synchronized Federated Learning [66.93447103966439]
Federated learning (FL) enables collaborative model training across distributed edge devices while preserving data privacy, and typically operates in a round-based synchronous manner.<n>We propose ADEL-FL, a novel framework that jointly optimize per-round deadlines and user-specific batch sizes for layer-wise aggregation.
arXiv Detail & Related papers (2025-05-29T19:59:18Z)
Byzantine-Resilient Over-the-Air Federated Learning under Zero-Trust Architecture [68.83934802584899]
We propose a novel Byzantine-robust FL paradigm for over-the-air transmissions, referred to as federated learning with secure adaptive clustering (FedSAC) FedSAC aims to protect a portion of the devices from attacks through zero trust architecture (ZTA) based Byzantine identification and adaptive device clustering. Numerical results substantiate the superiority of the proposed FedSAC over existing methods in terms of both test accuracy and convergence rate.
arXiv Detail & Related papers (2025-03-24T01:56:30Z)
Byzantine-Resilient Federated Learning via Distributed Optimization [3.2075234058213757]
Byzantine attacks present a critical challenge to Federated Learning (FL) Traditional FL frameworks rely on aggregation-based protocols for model updates, leaving them vulnerable to sophisticated adversarial strategies. We show that the Primal-Dual Method of Multipliers (PDMM) inherently mitigates Byzantine impacts by leveraging its fault-tolerant consensus mechanism.
arXiv Detail & Related papers (2025-03-13T18:34:42Z)
Optimizing Asynchronous Federated Learning: A Delicate Trade-Off Between Model-Parameter Staleness and Update Frequency [0.9999629695552195]
We use gradient modeling to better understand the impact of design choices in asynchronous FL algorithms. We characterize in particular a fundamental trade-off for optimizing asynchronous FL. We show that these optimizations enhance accuracy by 10% to 30%.
arXiv Detail & Related papers (2025-02-12T08:38:13Z)
Digital Twin-Assisted Federated Learning with Blockchain in Multi-tier Computing Systems [67.14406100332671]
In Industry 4.0 systems, resource-constrained edge devices engage in frequent data interactions. This paper proposes a digital twin (DT) and federated digital twin (FL) scheme. The efficacy of our proposed cooperative interference-based FL process has been verified through numerical analysis.
arXiv Detail & Related papers (2024-11-04T17:48:02Z)
FADAS: Towards Federated Adaptive Asynchronous Optimization [56.09666452175333]
Federated learning (FL) has emerged as a widely adopted training paradigm for privacy-preserving machine learning. This paper introduces federated adaptive asynchronous optimization, named FADAS, a novel method that incorporates asynchronous updates into adaptive federated optimization with provable guarantees. We rigorously establish the convergence rate of the proposed algorithms and empirical results demonstrate the superior performance of FADAS over other asynchronous FL baselines.
arXiv Detail & Related papers (2024-07-25T20:02:57Z)
FedStaleWeight: Buffered Asynchronous Federated Learning with Fair Aggregation via Staleness Reweighting [9.261784956541641]
Asynchronous Federated Learning (AFL) methods have emerged as promising alternatives to their synchronous counterparts by the slowest agent. AFL model training heavily towards agents who can produce updates faster, leaving slower agents behind. We introduce FedStaleWeight, an algorithm addressing in aggregating asynchronous client updates by employing average staleness to compute fair re-weightings.
arXiv Detail & Related papers (2024-06-05T02:52:22Z)
Fault Tolerant ML: Efficient Meta-Aggregation and Synchronous Training [8.419845742978985]
We investigate the challenging framework of Byzantine-robust training in distributed machine learning (ML) systems. Our first contribution is the introduction of an efficient meta-aggregator that upgrades baseline aggregators to optimal performance levels. Our paper highlights its theoretical and practical advantages for Byzantine-robust training, especially in simplifying the tuning process.
arXiv Detail & Related papers (2024-05-23T16:29:30Z)
Stragglers-Aware Low-Latency Synchronous Federated Learning via Layer-Wise Model Updates [71.81037644563217]
Synchronous federated learning (FL) is a popular paradigm for collaborative edge learning. As some of the devices may have limited computational resources and varying availability, FL latency is highly sensitive to stragglers. We propose straggler-aware layer-wise federated learning (SALF) that leverages the optimization procedure of NNs via backpropagation to update the global model in a layer-wise fashion.
arXiv Detail & Related papers (2024-03-27T09:14:36Z)
On the Role of Server Momentum in Federated Learning [85.54616432098706]
We propose a general framework for server momentum, that (a) covers a large class of momentum schemes that are unexplored in federated learning (FL) We provide rigorous convergence analysis for the proposed framework.
arXiv Detail & Related papers (2023-12-19T23:56:49Z)
Asynchronous Federated Learning with Incentive Mechanism Based on Contract Theory [5.502596101979607]
We propose a novel asynchronous FL framework that integrates an incentive mechanism based on contract theory. Our framework exhibits a 1.35% accuracy improvement over the ideal Local SGD under attacks.
arXiv Detail & Related papers (2023-10-10T09:17:17Z)
Robustness and Accuracy Could Be Reconcilable by (Proper) Definition [109.62614226793833]
The trade-off between robustness and accuracy has been widely studied in the adversarial literature. We find that it may stem from the improperly defined robust error, which imposes an inductive bias of local invariance. By definition, SCORE facilitates the reconciliation between robustness and accuracy, while still handling the worst-case uncertainty.
arXiv Detail & Related papers (2022-02-21T10:36:09Z)
Learning from History for Byzantine Robust Optimization [52.68913869776858]
Byzantine robustness has received significant attention recently given its importance for distributed learning. We show that most existing robust aggregation rules may not converge even in the absence of any Byzantine attackers.
arXiv Detail & Related papers (2020-12-18T16:22:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.