Fault Tolerant ML: Efficient Meta-Aggregation and Synchronous Training
- URL: http://arxiv.org/abs/2405.14759v3
- Date: Mon, 2 Sep 2024 04:51:17 GMT
- Title: Fault Tolerant ML: Efficient Meta-Aggregation and Synchronous Training
- Authors: Tehila Dahan, Kfir Y. Levy,
- Abstract summary: We investigate the challenging framework of Byzantine-robust training in distributed machine learning (ML) systems.
Our first contribution is the introduction of an efficient meta-aggregator that upgrades baseline aggregators to optimal performance levels.
Our paper highlights its theoretical and practical advantages for Byzantine-robust training, especially in simplifying the tuning process.
- Score: 8.419845742978985
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we investigate the challenging framework of Byzantine-robust training in distributed machine learning (ML) systems, focusing on enhancing both efficiency and practicality. As distributed ML systems become integral for complex ML tasks, ensuring resilience against Byzantine failures-where workers may contribute incorrect updates due to malice or error-gains paramount importance. Our first contribution is the introduction of the Centered Trimmed Meta Aggregator (CTMA), an efficient meta-aggregator that upgrades baseline aggregators to optimal performance levels, while requiring low computational demands. Additionally, we propose harnessing a recently developed gradient estimation technique based on a double-momentum strategy within the Byzantine context. Our paper highlights its theoretical and practical advantages for Byzantine-robust training, especially in simplifying the tuning process and reducing the reliance on numerous hyperparameters. The effectiveness of this technique is supported by theoretical insights within the stochastic convex optimization (SCO) framework and corroborated by empirical evidence.
Related papers
- RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models [95.32315448601241]
We propose an algorithm named Rotated Straight-Through-Estimator (RoSTE)
RoSTE combines quantization-aware supervised fine-tuning (QA-SFT) with an adaptive rotation strategy to reduce activation outliers.
Our findings reveal that the prediction error is directly proportional to the quantization error of the converged weights, which can be effectively managed through an optimized rotation configuration.
arXiv Detail & Related papers (2025-02-13T06:44:33Z) - Weight for Robustness: A Comprehensive Approach towards Optimal Fault-Tolerant Asynchronous ML [8.419845742978985]
Asynchronous systems struggle with maintaining integrity against Byzantine failures.
We introduce a novel weighted robust aggregation framework to tackle these issues.
We achieve an optimal convergence rate for the first time in an asynchronous Byzantine environment.
arXiv Detail & Related papers (2025-01-16T16:00:52Z) - On the Robustness of Distributed Machine Learning against Transfer Attacks [1.0787328610467801]
No prior work has examined the combined robustness stemming from distributing both the learning and the inference process.
We show that properly distributed ML instantiations achieve across-the-board improvements in accuracy-robustness tradeoffs against state-of-the-art transfer-based attacks.
arXiv Detail & Related papers (2024-12-18T17:27:17Z) - Refining Salience-Aware Sparse Fine-Tuning Strategies for Language Models [14.68920095399595]
sparsity-based PEFT (SPEFT) introduces trainable sparse adaptations to the weight matrices in the model.
We conduct the first systematic evaluation of salience metrics for SPEFT, inspired by zero-cost NAS proxies.
Our work challenges the notion that complexity is necessary for effective PEFT.
arXiv Detail & Related papers (2024-12-18T04:14:35Z) - FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models [50.331708897857574]
We introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications.
FactorLLM achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed.
arXiv Detail & Related papers (2024-08-15T16:45:16Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - Optimization-Derived Learning with Essential Convergence Analysis of
Training and Hyper-training [52.39882976848064]
We design a Generalized Krasnoselskii-Mann (GKM) scheme based on fixed-point iterations as our fundamental ODL module.
Under the GKM scheme, a Bilevel Meta Optimization (BMO) algorithmic framework is constructed to solve the optimal training and hyper-training variables together.
arXiv Detail & Related papers (2022-06-16T01:50:25Z) - Building Robust Ensembles via Margin Boosting [98.56381714748096]
In adversarial robustness, a single model does not usually have enough power to defend against all possible adversarial attacks.
We develop an algorithm for learning an ensemble with maximum margin.
We show that our algorithm not only outperforms existing ensembling techniques, but also large models trained in an end-to-end fashion.
arXiv Detail & Related papers (2022-06-07T14:55:58Z) - Meta-Learning with Neural Tangent Kernels [58.06951624702086]
We propose the first meta-learning paradigm in the Reproducing Kernel Hilbert Space (RKHS) induced by the meta-model's Neural Tangent Kernel (NTK)
Within this paradigm, we introduce two meta-learning algorithms, which no longer need a sub-optimal iterative inner-loop adaptation as in the MAML framework.
We achieve this goal by 1) replacing the adaptation with a fast-adaptive regularizer in the RKHS; and 2) solving the adaptation analytically based on the NTK theory.
arXiv Detail & Related papers (2021-02-07T20:53:23Z) - The reinforcement learning-based multi-agent cooperative approach for
the adaptive speed regulation on a metallurgical pickling line [0.0]
The proposed approach combines mathematical modeling as a base algorithm and a cooperative Multi-Agent Reinforcement Learning system.
We demonstrate how Deep Q-Learning can be applied to a real-life task in a heavy industry, resulting in significant improvement of previously existing automation systems.
arXiv Detail & Related papers (2020-08-16T15:10:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.