Robust Multimodal Failure Detection for Microservice Systems
- URL: http://arxiv.org/abs/2305.18985v1
- Date: Tue, 30 May 2023 12:39:42 GMT
- Title: Robust Multimodal Failure Detection for Microservice Systems
- Authors: Chenyu Zhao, Minghua Ma, Zhenyu Zhong, Shenglin Zhang, Zhiyuan Tan,
Xiao Xiong, LuLu Yu, Jiayi Feng, Yongqian Sun, Yuzhi Zhang, Dan Pei, Qingwei
Lin, Dongmei Zhang
- Abstract summary: AnoFusion is an unsupervised failure detection approach for microservice systems.
It learns the correlation of the heterogeneous multimodal data and integrates a Graph Attention Network (GAT) and Gated Recurrent Unit (GRU)
It achieves the F1-score of 0.857 and 0.922, respectively, outperforming state-of-the-art failure detection approaches.
- Score: 32.25907616511765
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Proactive failure detection of instances is vitally essential to microservice
systems because an instance failure can propagate to the whole system and
degrade the system's performance. Over the years, many single-modal (i.e.,
metrics, logs, or traces) data-based nomaly detection methods have been
proposed. However, they tend to miss a large number of failures and generate
numerous false alarms because they ignore the correlation of multimodal data.
In this work, we propose AnoFusion, an unsupervised failure detection approach,
to proactively detect instance failures through multimodal data for
microservice systems. It applies a Graph Transformer Network (GTN) to learn the
correlation of the heterogeneous multimodal data and integrates a Graph
Attention Network (GAT) with Gated Recurrent Unit (GRU) to address the
challenges introduced by dynamically changing multimodal data. We evaluate the
performance of AnoFusion through two datasets, demonstrating that it achieves
the F1-score of 0.857 and 0.922, respectively, outperforming the
state-of-the-art failure detection approaches.
Related papers
- CHASE: A Causal Heterogeneous Graph based Framework for Root Cause Analysis in Multimodal Microservice Systems [22.00860661894853]
We propose a Causal Heterogeneous grAph baSed framEwork for root cause analysis, namely CHASE, for microservice systems with multimodal data.
CHASE learns from the constructed hypergraph with hyperedges representing the flow of causality and performs root cause localization.
arXiv Detail & Related papers (2024-06-28T07:46:51Z) - Twin Graph-based Anomaly Detection via Attentive Multi-Modal Learning
for Microservice System [24.2074235652359]
We propose MSTGAD, which seamlessly integrates all available data modalities via attentive multi-modal learning.
We construct a transformer-based neural network with both spatial and temporal attention mechanisms to model the inter-correlations between different modalities.
This enables us to detect anomalies automatically and accurately in real-time.
arXiv Detail & Related papers (2023-10-07T06:28:41Z) - Efficient pattern-based anomaly detection in a network of multivariate
devices [0.17188280334580192]
We propose a scalable approach to detect anomalies using a two-step approach.
First, we recover relations between entities in the network, since relations are often dynamic in nature and caused by an unknown underlying process.
Next, we report anomalies based on an embedding of sequential patterns.
arXiv Detail & Related papers (2023-05-07T16:05:30Z) - Robust Failure Diagnosis of Microservice System through Multimodal Data [14.720995687799668]
We propose DiagFusion, a robust failure diagnosis approach that uses multimodal data.
Our evaluations show that DiagFusion outperforms existing methods in terms of root cause instance localization and failure type determination.
arXiv Detail & Related papers (2023-02-21T08:28:28Z) - Heterogeneous Anomaly Detection for Software Systems via Semi-supervised
Cross-modal Attention [29.654681594903114]
We propose Hades, the first end-to-end semi-supervised approach to identify system anomalies based on heterogeneous data.
Our approach employs a hierarchical architecture to learn a global representation of the system status by fusing log semantics and metric patterns.
We evaluate Hades extensively on large-scale simulated data and datasets from Huawei Cloud.
arXiv Detail & Related papers (2023-02-14T09:02:11Z) - PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows.
Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z) - MMRNet: Improving Reliability for Multimodal Object Detection and
Segmentation for Bin Picking via Multimodal Redundancy [68.7563053122698]
We propose a reliable object detection and segmentation system with MultiModal Redundancy (MMRNet)
This is the first system that introduces the concept of multimodal redundancy to address sensor failure issues during deployment.
We present a new label-free multi-modal consistency (MC) score that utilizes the output from all modalities to measure the overall system output reliability and uncertainty.
arXiv Detail & Related papers (2022-10-19T19:15:07Z) - An Outlier Exposure Approach to Improve Visual Anomaly Detection
Performance for Mobile Robots [76.36017224414523]
We consider the problem of building visual anomaly detection systems for mobile robots.
Standard anomaly detection models are trained using large datasets composed only of non-anomalous data.
We tackle the problem of exploiting these data to improve the performance of a Real-NVP anomaly detection model.
arXiv Detail & Related papers (2022-09-20T15:18:13Z) - Causality-Based Multivariate Time Series Anomaly Detection [63.799474860969156]
We formulate the anomaly detection problem from a causal perspective and view anomalies as instances that do not follow the regular causal mechanism to generate the multivariate data.
We then propose a causality-based anomaly detection approach, which first learns the causal structure from data and then infers whether an instance is an anomaly relative to the local causal mechanism.
We evaluate our approach with both simulated and public datasets as well as a case study on real-world AIOps applications.
arXiv Detail & Related papers (2022-06-30T06:00:13Z) - From Unsupervised to Few-shot Graph Anomaly Detection: A Multi-scale
Contrastive Learning Approach [49.439021563395976]
Anomaly detection from graph data is an important data mining task in many applications such as social networks, finance, and e-commerce.
We propose a novel framework, graph ANomaly dEtection framework with Multi-scale cONtrastive lEarning (ANEMONE in short)
By using a graph neural network as a backbone to encode the information from multiple graph scales (views), we learn better representation for nodes in a graph.
arXiv Detail & Related papers (2022-02-11T09:45:11Z) - TadGAN: Time Series Anomaly Detection Using Generative Adversarial
Networks [73.01104041298031]
TadGAN is an unsupervised anomaly detection approach built on Generative Adversarial Networks (GANs)
To capture the temporal correlations of time series, we use LSTM Recurrent Neural Networks as base models for Generators and Critics.
To demonstrate the performance and generalizability of our approach, we test several anomaly scoring techniques and report the best-suited one.
arXiv Detail & Related papers (2020-09-16T15:52:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.