Robust Multimodal Failure Detection for Microservice Systems
- URL: http://arxiv.org/abs/2305.18985v1
- Date: Tue, 30 May 2023 12:39:42 GMT
- Title: Robust Multimodal Failure Detection for Microservice Systems
- Authors: Chenyu Zhao, Minghua Ma, Zhenyu Zhong, Shenglin Zhang, Zhiyuan Tan,
Xiao Xiong, LuLu Yu, Jiayi Feng, Yongqian Sun, Yuzhi Zhang, Dan Pei, Qingwei
Lin, Dongmei Zhang
- Abstract summary: AnoFusion is an unsupervised failure detection approach for microservice systems.
It learns the correlation of the heterogeneous multimodal data and integrates a Graph Attention Network (GAT) and Gated Recurrent Unit (GRU)
It achieves the F1-score of 0.857 and 0.922, respectively, outperforming state-of-the-art failure detection approaches.
- Score: 32.25907616511765
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Proactive failure detection of instances is vitally essential to microservice
systems because an instance failure can propagate to the whole system and
degrade the system's performance. Over the years, many single-modal (i.e.,
metrics, logs, or traces) data-based nomaly detection methods have been
proposed. However, they tend to miss a large number of failures and generate
numerous false alarms because they ignore the correlation of multimodal data.
In this work, we propose AnoFusion, an unsupervised failure detection approach,
to proactively detect instance failures through multimodal data for
microservice systems. It applies a Graph Transformer Network (GTN) to learn the
correlation of the heterogeneous multimodal data and integrates a Graph
Attention Network (GAT) with Gated Recurrent Unit (GRU) to address the
challenges introduced by dynamically changing multimodal data. We evaluate the
performance of AnoFusion through two datasets, demonstrating that it achieves
the F1-score of 0.857 and 0.922, respectively, outperforming the
state-of-the-art failure detection approaches.
Related papers
- TVDiag: A Task-oriented and View-invariant Failure Diagnosis Framework with Multimodal Data [11.373761837547852]
Microservice-based systems often suffer from reliability issues due to their intricate interactions and expanding scale.
Traditional failure diagnosis methods that use single-modal data can hardly cover all failure scenarios due to the restricted information.
We propose textitTVDiag, a multimodal failure diagnosis framework for locating culprit microservice instances and identifying their failure types.
arXiv Detail & Related papers (2024-07-29T05:26:57Z) - CHASE: A Causal Heterogeneous Graph based Framework for Root Cause Analysis in Multimodal Microservice Systems [22.00860661894853]
We propose a Causal Heterogeneous grAph baSed framEwork for root cause analysis, namely CHASE, for microservice systems with multimodal data.
CHASE learns from the constructed hypergraph with hyperedges representing the flow of causality and performs root cause localization.
arXiv Detail & Related papers (2024-06-28T07:46:51Z) - Twin Graph-based Anomaly Detection via Attentive Multi-Modal Learning
for Microservice System [24.2074235652359]
We propose MSTGAD, which seamlessly integrates all available data modalities via attentive multi-modal learning.
We construct a transformer-based neural network with both spatial and temporal attention mechanisms to model the inter-correlations between different modalities.
This enables us to detect anomalies automatically and accurately in real-time.
arXiv Detail & Related papers (2023-10-07T06:28:41Z) - Robust Failure Diagnosis of Microservice System through Multimodal Data [14.720995687799668]
We propose DiagFusion, a robust failure diagnosis approach that uses multimodal data.
Our evaluations show that DiagFusion outperforms existing methods in terms of root cause instance localization and failure type determination.
arXiv Detail & Related papers (2023-02-21T08:28:28Z) - Heterogeneous Anomaly Detection for Software Systems via Semi-supervised
Cross-modal Attention [29.654681594903114]
We propose Hades, the first end-to-end semi-supervised approach to identify system anomalies based on heterogeneous data.
Our approach employs a hierarchical architecture to learn a global representation of the system status by fusing log semantics and metric patterns.
We evaluate Hades extensively on large-scale simulated data and datasets from Huawei Cloud.
arXiv Detail & Related papers (2023-02-14T09:02:11Z) - PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows.
Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z) - MMRNet: Improving Reliability for Multimodal Object Detection and
Segmentation for Bin Picking via Multimodal Redundancy [68.7563053122698]
We propose a reliable object detection and segmentation system with MultiModal Redundancy (MMRNet)
This is the first system that introduces the concept of multimodal redundancy to address sensor failure issues during deployment.
We present a new label-free multi-modal consistency (MC) score that utilizes the output from all modalities to measure the overall system output reliability and uncertainty.
arXiv Detail & Related papers (2022-10-19T19:15:07Z) - An Outlier Exposure Approach to Improve Visual Anomaly Detection
Performance for Mobile Robots [76.36017224414523]
We consider the problem of building visual anomaly detection systems for mobile robots.
Standard anomaly detection models are trained using large datasets composed only of non-anomalous data.
We tackle the problem of exploiting these data to improve the performance of a Real-NVP anomaly detection model.
arXiv Detail & Related papers (2022-09-20T15:18:13Z) - Causality-Based Multivariate Time Series Anomaly Detection [63.799474860969156]
We formulate the anomaly detection problem from a causal perspective and view anomalies as instances that do not follow the regular causal mechanism to generate the multivariate data.
We then propose a causality-based anomaly detection approach, which first learns the causal structure from data and then infers whether an instance is an anomaly relative to the local causal mechanism.
We evaluate our approach with both simulated and public datasets as well as a case study on real-world AIOps applications.
arXiv Detail & Related papers (2022-06-30T06:00:13Z) - DAE : Discriminatory Auto-Encoder for multivariate time-series anomaly
detection in air transportation [68.8204255655161]
We propose a novel anomaly detection model called Discriminatory Auto-Encoder (DAE)
It uses the baseline of a regular LSTM-based auto-encoder but with several decoders, each getting data of a specific flight phase.
Results show that the DAE achieves better results in both accuracy and speed of detection.
arXiv Detail & Related papers (2021-09-08T14:07:55Z) - TadGAN: Time Series Anomaly Detection Using Generative Adversarial
Networks [73.01104041298031]
TadGAN is an unsupervised anomaly detection approach built on Generative Adversarial Networks (GANs)
To capture the temporal correlations of time series, we use LSTM Recurrent Neural Networks as base models for Generators and Critics.
To demonstrate the performance and generalizability of our approach, we test several anomaly scoring techniques and report the best-suited one.
arXiv Detail & Related papers (2020-09-16T15:52:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.