Related papers: FT-MoE: Sustainable-learning Mixture of Experts Model for Fault-Tolerant Computing with Multiple Tasks

FT-MoE: Sustainable-learning Mixture of Experts Model for Fault-Tolerant Computing with Multiple Tasks

URL: http://arxiv.org/abs/2504.20446v1
Date: Tue, 29 Apr 2025 05:44:59 GMT
Title: FT-MoE: Sustainable-learning Mixture of Experts Model for Fault-Tolerant Computing with Multiple Tasks
Authors: Wenjing Xiao, Wenhao Song, Miaojiang Chen, Ruikun Luo, Min Chen,
Abstract summary: We propose FT-MoE, a sustainable-learning mixture-of-experts model for fault-tolerant computing with multiple tasks.<n>We present a dual mixture of experts networks for high-accurate prediction for both fault detection and classification tasks.<n>We conduct extensive experiments on the FT benchmark to verify the effectiveness of FT-MoE.
Score: 5.271397717002302
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Intelligent fault-tolerant (FT) computing has recently demonstrated significant advantages of predicting and diagnosing faults in advance, enabling reliable service delivery. However, due to heterogeneity of fault knowledge and complex dependence relationships of time series log data, existing deep learning-based FT algorithms further improve detection performance relying on single neural network model with difficulty. To this end, we propose FT-MoE, a sustainable-learning mixture-of-experts model for fault-tolerant computing with multiple tasks, which enables different parameters learning distinct fault knowledge to achieve high-reliability for service system. Firstly, we use decoder-based transformer models to obtain fault prototype vectors of decoupling long-distance dependencies. Followed by, we present a dual mixture of experts networks for high-accurate prediction for both fault detection and classification tasks. Then, we design a two-stage optimization scheme of offline training and online tuning, which allows that in operation FT-MoE can also keep learning to adapt to dynamic service environments. Finally, to verify the effectiveness of FT-MoE, we conduct extensive experiments on the FT benchmark. Experimental results show that FT-MoE achieves superior performance compared to the state-of-the-art methods. Code will be available upon publication.

Related papers

Reinforcement Fine-Tuning Enables MLLMs Learning Novel Tasks Stably [80.36077974826865]
Post-training algorithms such as Supervised Fine-Tuning (SFT) and Reinforcement Fine-Tuning (RFT) are widely used to adapt multimodal large language models to downstream tasks.<n>We study the behavior of SFT and RFT on an open-source multimodal model, Qwen2.5-VL.<n>Our experiments reveal a sharp trade-off: SFT enables rapid task acquisition but leads to catastrophic forgetting, whereas RFT learns more slowly on novel tasks but maintains prior knowledge.
arXiv Detail & Related papers (2025-06-30T04:15:01Z)
Latent Factorization of Tensors with Threshold Distance Weighted Loss for Traffic Data Estimation [4.079031335530995]
In real-word traffic data collection processes, issues such as communication failures often lead to incomplete or corrupted datasets.<n>Latent factorization of outliers (LFT) model has emerged as widely adopted and effective solution.<n>This paper proposes a threshold distance weighted (TDW) loss sensitivity-ind Latent Factorization of outliers (TDFTWL) model.<n>The proposed TDFTWL model consistently outperforms state-of-the-art approaches in terms of both accuracy and computational efficiency.
arXiv Detail & Related papers (2025-06-11T05:36:13Z)
Discriminative Finetuning of Generative Large Language Models without Reward Models and Human Preference Data [73.04828796123581]
Supervised fine-tuning (SFT) has become a crucial step for aligning pretrained large language models (LLMs)<n>We introduce Discriminative Fine-Tuning (DFT), an improved variant of SFT, which mitigates the burden of collecting human-labeled preference data.<n>Our contributions include: (i) a discriminative probabilistic framework for fine-tuning LLMs by explicitly modeling the discriminative likelihood of an answer among all possible outputs given an input; (ii) efficient algorithms to optimize this discriminative likelihood; and (iii) extensive experiments demonstrating DFT's effectiveness
arXiv Detail & Related papers (2025-02-25T22:38:55Z)
Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate [105.86576388991713]
We introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives. We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets.
arXiv Detail & Related papers (2024-10-29T14:41:44Z)
MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning [28.12788291168137]
We present a multi-task fine-tuning framework, MFTcoder, that enables simultaneous and parallel fine-tuning on multiple tasks. Experiments have conclusively demonstrated that our multi-task fine-tuning approach outperforms both individual fine-tuning on single tasks and fine-tuning on a mixed ensemble of tasks.
arXiv Detail & Related papers (2023-11-04T02:22:40Z)
End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures. We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z)
An Incomplete Tensor Tucker decomposition based Traffic Speed Prediction Method [0.0]
This work integrates the unique advantages of the proportional-integral-derivative (PID) controller into a Tucker decomposition based LFT model. Experiments on two major city traffic road speed datasets show that the proposed model achieves significant efficiency gain and highly competitive prediction accuracy.
arXiv Detail & Related papers (2023-04-21T13:59:28Z)
DeepFT: Fault-Tolerant Edge Computing using a Self-Supervised Deep Surrogate Model [12.335763358698564]
We propose DeepFT to proactively avoid system overloads and their adverse effects. DeepFT uses a deep surrogate model to accurately predict and diagnose faults in the system. It offers a highly scalable solution as the model size scales by only 3 and 1 percent per unit increase in the number of active tasks and hosts.
arXiv Detail & Related papers (2022-12-02T16:51:58Z)
Automatic inference of fault tree models via multi-objective evolutionary algorithms [1.189955933770711]
Fault tree analysis is a well-known technique in reliability engineering and risk assessment. Traditionally, fault tree models are built manually together with domain experts, considered a time-consuming process prone to human errors. With Industry 4.0, there is an increasing availability of inspection and monitoring data, making techniques that enable knowledge extraction from large data sets relevant. We propose a data-driven approach to infer efficient FT structures that achieve a complete representation of the failure mechanisms contained in the failure data set without human intervention.
arXiv Detail & Related papers (2022-04-06T13:19:41Z)
Bias-Variance Tradeoffs in Single-Sample Binary Gradient Estimators [100.58924375509659]
Straight-through (ST) estimator gained popularity due to its simplicity and efficiency. Several techniques were proposed to improve over ST while keeping the same low computational complexity. We conduct a theoretical analysis of Bias and Variance of these methods in order to understand tradeoffs and verify originally claimed properties.
arXiv Detail & Related papers (2021-10-07T15:16:07Z)
MoEfication: Conditional Computation of Transformer Models for Efficient Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost. We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon. We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z)
Adaptive Anomaly Detection for Internet of Things in Hierarchical Edge Computing: A Contextual-Bandit Approach [81.5261621619557]
We propose an adaptive anomaly detection scheme with hierarchical edge computing (HEC) We first construct multiple anomaly detection DNN models with increasing complexity, and associate each of them to a corresponding HEC layer. Then, we design an adaptive model selection scheme that is formulated as a contextual-bandit problem and solved by using a reinforcement learning policy network.
arXiv Detail & Related papers (2021-08-09T08:45:47Z)
Deep Multi-Task Learning for Cooperative NOMA: System Design and Principles [52.79089414630366]
We develop a novel deep cooperative NOMA scheme, drawing upon the recent advances in deep learning (DL) We develop a novel hybrid-cascaded deep neural network (DNN) architecture such that the entire system can be optimized in a holistic manner.
arXiv Detail & Related papers (2020-07-27T12:38:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.