Related papers: Time-Series Learning for Proactive Fault Prediction in Distributed Systems with Deep Neural Structures

Time-Series Learning for Proactive Fault Prediction in Distributed Systems with Deep Neural Structures

URL: http://arxiv.org/abs/2505.20705v1
Date: Tue, 27 May 2025 04:31:12 GMT
Title: Time-Series Learning for Proactive Fault Prediction in Distributed Systems with Deep Neural Structures
Authors: Yang Wang, Wenxuan Zhu, Xuehui Quan, Heyi Wang, Chang Liu, Qiyuan Wu,
Abstract summary: This paper addresses the challenges of fault prediction and delayed response in distributed systems.<n>We use a Gated Recurrent Unit to model the evolution of system states over time.<n>An attention mechanism is then applied to enhance key temporal segments, improving the model's ability to identify potential faults.
Score: 5.572536027964037
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper addresses the challenges of fault prediction and delayed response in distributed systems by proposing an intelligent prediction method based on temporal feature learning. The method takes multi-dimensional performance metric sequences as input. We use a Gated Recurrent Unit (GRU) to model the evolution of system states over time. An attention mechanism is then applied to enhance key temporal segments, improving the model's ability to identify potential faults. On this basis, a feedforward neural network is designed to perform the final classification, enabling early warning of system failures. To validate the effectiveness of the proposed approach, comparative experiments and ablation analyses were conducted using data from a large-scale real-world cloud system. The experimental results show that the model outperforms various mainstream time-series models in terms of Accuracy, F1-Score, and AUC. This demonstrates strong prediction capability and stability. Furthermore, the loss function curve confirms the convergence and reliability of the training process. It indicates that the proposed method effectively learns system behavior patterns and achieves efficient fault detection.

Related papers

Cloud-Based AI Systems: Leveraging Large Language Models for Intelligent Fault Detection and Autonomous Self-Healing [1.819979627431298]
We propose a novel AI framework based on Massive Language Model (LLM) for intelligent fault detection and self-healing mechanisms in cloud systems.<n>The proposed model is significantly better than the traditional fault detection system in terms of fault detection accuracy, system downtime reduction and recovery speed.
arXiv Detail & Related papers (2025-05-16T23:02:57Z)
Adaptive Fault Tolerance Mechanisms of Large Language Models in Cloud Computing Environments [5.853391005435494]
This study proposes a novel adaptive fault tolerance mechanism to ensure the reliability and availability of large-scale language models in cloud computing scenarios.<n>It builds upon known fault-tolerant mechanisms, such as checkpointing, redundancy, and state transposition, introducing dynamic resource allocation and prediction of failure based on real-time performance metrics.
arXiv Detail & Related papers (2025-03-15T18:45:33Z)
Causal Interventional Prediction System for Robust and Explainable Effect Forecasting [14.104665282086339]
We explore the robustness and explainability of AI-based forecasting systems. We design a causal interventional prediction system (CIPS) based on a variational autoencoder and fully conditional specification of multiple imputations.
arXiv Detail & Related papers (2024-07-29T04:16:45Z)
Root Causing Prediction Anomalies Using Explainable AI [3.970146574042422]
We present a novel application of explainable AI (XAI) for root-causing performance degradation in machine learning models. A single feature corruption can cause cascading feature, label and concept drifts. We have successfully applied this technique to improve the reliability of models used in personalized advertising.
arXiv Detail & Related papers (2024-03-04T19:38:50Z)
Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold. We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples. We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z)
Physics-Inspired Temporal Learning of Quadrotor Dynamics for Accurate Model Predictive Trajectory Tracking [76.27433308688592]
Accurately modeling quadrotor's system dynamics is critical for guaranteeing agile, safe, and stable navigation. We present a novel Physics-Inspired Temporal Convolutional Network (PI-TCN) approach to learning quadrotor's system dynamics purely from robot experience. Our approach combines the expressive power of sparse temporal convolutions and dense feed-forward connections to make accurate system predictions.
arXiv Detail & Related papers (2022-06-07T13:51:35Z)
Large-Scale Sequential Learning for Recommender and Engineering Systems [91.3755431537592]
In this thesis, we focus on the design of an automatic algorithms that provide personalized ranking by adapting to the current conditions. For the former, we propose novel algorithm called SAROS that take into account both kinds of feedback for learning over the sequence of interactions. The proposed idea of taking into account the neighbour lines shows statistically significant results in comparison with the initial approach for faults detection in power grid.
arXiv Detail & Related papers (2022-05-13T21:09:41Z)
Leveraging the structure of dynamical systems for data-driven modeling [111.45324708884813]
We consider the impact of the training set and its structure on the quality of the long-term prediction. We show how an informed design of the training set, based on invariants of the system and the structure of the underlying attractor, significantly improves the resulting models.
arXiv Detail & Related papers (2021-12-15T20:09:20Z)
Cloud Failure Prediction with Hierarchical Temporary Memory: An Empirical Assessment [64.73243241568555]
Hierarchical Temporary Memory (HTM) is an unsupervised learning algorithm inspired by the features of the neocortex. This paper presents the first systematic study that assesses HTM in the context of failure prediction.
arXiv Detail & Related papers (2021-10-06T07:09:45Z)
Improving Uncertainty Calibration via Prior Augmented Data [56.88185136509654]
Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators. They are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions. We propose a solution by seeking out regions of feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the prior distribution of the labels.
arXiv Detail & Related papers (2021-02-22T07:02:37Z)
Stochastically forced ensemble dynamic mode decomposition for forecasting and analysis of near-periodic systems [65.44033635330604]
We introduce a novel load forecasting method in which observed dynamics are modeled as a forced linear system. We show that its use of intrinsic linear dynamics offers a number of desirable properties in terms of interpretability and parsimony. Results are presented for a test case using load data from an electrical grid.
arXiv Detail & Related papers (2020-10-08T20:25:52Z)
A comprehensive study on the prediction reliability of graph neural networks for virtual screening [0.0]
We investigate the effects of model architectures, regularization methods, and loss functions on the prediction performance and reliability of classification results. Our result highlights that correct choice of regularization and inference methods is evidently important to achieve high success rate.
arXiv Detail & Related papers (2020-03-17T10:13:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.