Related papers: Fault Tolerant ML: Efficient Meta-Aggregation and Synchronous Training

Fault Tolerant ML: Efficient Meta-Aggregation and Synchronous Training

URL: http://arxiv.org/abs/2405.14759v3
Date: Mon, 2 Sep 2024 04:51:17 GMT
Title: Fault Tolerant ML: Efficient Meta-Aggregation and Synchronous Training
Authors: Tehila Dahan, Kfir Y. Levy,
Abstract summary: We investigate the challenging framework of Byzantine-robust training in distributed machine learning (ML) systems. Our first contribution is the introduction of an efficient meta-aggregator that upgrades baseline aggregators to optimal performance levels. Our paper highlights its theoretical and practical advantages for Byzantine-robust training, especially in simplifying the tuning process.
Score: 8.419845742978985
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we investigate the challenging framework of Byzantine-robust training in distributed machine learning (ML) systems, focusing on enhancing both efficiency and practicality. As distributed ML systems become integral for complex ML tasks, ensuring resilience against Byzantine failures-where workers may contribute incorrect updates due to malice or error-gains paramount importance. Our first contribution is the introduction of the Centered Trimmed Meta Aggregator (CTMA), an efficient meta-aggregator that upgrades baseline aggregators to optimal performance levels, while requiring low computational demands. Additionally, we propose harnessing a recently developed gradient estimation technique based on a double-momentum strategy within the Byzantine context. Our paper highlights its theoretical and practical advantages for Byzantine-robust training, especially in simplifying the tuning process and reducing the reliance on numerous hyperparameters. The effectiveness of this technique is supported by theoretical insights within the stochastic convex optimization (SCO) framework and corroborated by empirical evidence.

Related papers

Discovering Process-Outcome Credit in Multi-Step LLM Reasoning [3.584086358722852]
Reinforcement Learning (RL) serves as a potent paradigm for enhancing reasoning capabilities in Large Language Models (LLMs)<n>We propose a novel framework designed to provide continuous reward signals.<n>Our model exhibits superior out-of-distribution robustness, demonstrating promising zero-shot transfer capabilities to unseen and challenging reasoning tasks.
arXiv Detail & Related papers (2026-02-01T05:44:09Z)
A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning [0.40964539027092906]
Supervised Fine-Tuning and Reinforcement Learning are the dominant training paradigms.<n>This paper introduces a practical and effective training recipe that strategically integrates extended SFT with RL from online inference.<n>Our experiments reveal that extending SFT for as many as 10 epochs is crucial for performance breakthroughs.<n>This work provides the community with a battle-tested blueprint for developing state-of-the-art mathematical reasoners.
arXiv Detail & Related papers (2025-07-11T02:26:01Z)
MoxE: Mixture of xLSTM Experts with Entropy-Aware Routing for Efficient Language Modeling [6.553328746906528]
MoxE is a novel architecture that combines the Extended Long Short-Term Memory (xLSTM) with the Mixture of Experts (MoE) framework.<n>At the heart of our approach is a novel entropy-based routing mechanism, designed to dynamically route tokens to specialized experts.<n>MoxE achieves significant efficiency gains and enhanced effectiveness compared to existing approaches.
arXiv Detail & Related papers (2025-05-01T12:06:39Z)
Weight for Robustness: A Comprehensive Approach towards Optimal Fault-Tolerant Asynchronous ML [8.419845742978985]
Asynchronous systems struggle with maintaining integrity against Byzantine failures. We introduce a novel weighted robust aggregation framework to tackle these issues. We achieve an optimal convergence rate for the first time in an asynchronous Byzantine environment.
arXiv Detail & Related papers (2025-01-16T16:00:52Z)
On the Robustness of Distributed Machine Learning against Transfer Attacks [1.0787328610467801]
No prior work has examined the combined robustness stemming from distributing both the learning and the inference process. We show that properly distributed ML instantiations achieve across-the-board improvements in accuracy-robustness tradeoffs against state-of-the-art transfer-based attacks.
arXiv Detail & Related papers (2024-12-18T17:27:17Z)
Refining Salience-Aware Sparse Fine-Tuning Strategies for Language Models [14.68920095399595]
sparsity-based PEFT (SPEFT) introduces trainable sparse adaptations to the weight matrices in the model. We conduct the first systematic evaluation of salience metrics for SPEFT, inspired by zero-cost NAS proxies. Our work challenges the notion that complexity is necessary for effective PEFT.
arXiv Detail & Related papers (2024-12-18T04:14:35Z)
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models [50.331708897857574]
We introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications. FactorLLM achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed.
arXiv Detail & Related papers (2024-08-15T16:45:16Z)
Attention is Naturally Sparse with Gaussian Distributed Input [8.602260591839318]
This study presents a rigorous theoretical analysis of the sparsity in attention scores within Large Language Models (LLMs) Our main contribution lies in providing a detailed theoretical examination of how sparsity manifests in attention mechanisms, offering insights into the potential trade-offs between computational savings and model effectiveness.
arXiv Detail & Related papers (2024-04-03T12:37:34Z)
Orchestration of Emulator Assisted Mobile Edge Tuning for AI Foundation Models: A Multi-Agent Deep Reinforcement Learning Approach [10.47302625959368]
We present a groundbreaking paradigm integrating Mobile Edge Computing with foundation models, specifically designed to enhance local task performance on user equipment (UE) Central to our approach is the innovative Emulator-Adapter architecture, segmenting the foundation model into two cohesive modules. We introduce an advanced resource allocation mechanism that is fine-tuned to the needs of the Emulator-Adapter structure in decentralized settings.
arXiv Detail & Related papers (2023-10-26T15:47:51Z)
End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures. We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z)
Optimization-Derived Learning with Essential Convergence Analysis of Training and Hyper-training [52.39882976848064]
We design a Generalized Krasnoselskii-Mann (GKM) scheme based on fixed-point iterations as our fundamental ODL module. Under the GKM scheme, a Bilevel Meta Optimization (BMO) algorithmic framework is constructed to solve the optimal training and hyper-training variables together.
arXiv Detail & Related papers (2022-06-16T01:50:25Z)
Building Robust Ensembles via Margin Boosting [98.56381714748096]
In adversarial robustness, a single model does not usually have enough power to defend against all possible adversarial attacks. We develop an algorithm for learning an ensemble with maximum margin. We show that our algorithm not only outperforms existing ensembling techniques, but also large models trained in an end-to-end fashion.
arXiv Detail & Related papers (2022-06-07T14:55:58Z)
Learning with Multiclass AUC: Theory and Algorithms [141.63211412386283]
Area under the ROC curve (AUC) is a well-known ranking metric for problems such as imbalanced learning and recommender systems. In this paper, we start an early trial to consider the problem of learning multiclass scoring functions via optimizing multiclass AUC metrics.
arXiv Detail & Related papers (2021-07-28T05:18:10Z)
Meta-Learning with Neural Tangent Kernels [58.06951624702086]
We propose the first meta-learning paradigm in the Reproducing Kernel Hilbert Space (RKHS) induced by the meta-model's Neural Tangent Kernel (NTK) Within this paradigm, we introduce two meta-learning algorithms, which no longer need a sub-optimal iterative inner-loop adaptation as in the MAML framework. We achieve this goal by 1) replacing the adaptation with a fast-adaptive regularizer in the RKHS; and 2) solving the adaptation analytically based on the NTK theory.
arXiv Detail & Related papers (2021-02-07T20:53:23Z)
The reinforcement learning-based multi-agent cooperative approach for the adaptive speed regulation on a metallurgical pickling line [0.0]
The proposed approach combines mathematical modeling as a base algorithm and a cooperative Multi-Agent Reinforcement Learning system. We demonstrate how Deep Q-Learning can be applied to a real-life task in a heavy industry, resulting in significant improvement of previously existing automation systems.
arXiv Detail & Related papers (2020-08-16T15:10:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.