Related papers: Performance Issue Identification in Cloud Systems with Relational-Temporal Anomaly Detection

Performance Issue Identification in Cloud Systems with Relational-Temporal Anomaly Detection

URL: http://arxiv.org/abs/2307.10869v2
Date: Tue, 1 Aug 2023 07:04:29 GMT
Title: Performance Issue Identification in Cloud Systems with Relational-Temporal Anomaly Detection
Authors: Wenwei Gu, Jinyang Liu, Zhuangbin Chen, Jianping Zhang, Yuxin Su, Jiazhen Gu, Cong Feng, Zengyin Yang and Michael Lyu
Abstract summary: Performance issues permeate large-scale cloud service systems, which can lead to huge revenue losses. To ensure reliable performance, it's essential to accurately identify these issues using service monitoring metrics. Some existing methods tackle this problem by analyzing each metric independently to detect anomalies.
Score: 5.473091770227683
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Performance issues permeate large-scale cloud service systems, which can lead to huge revenue losses. To ensure reliable performance, it's essential to accurately identify and localize these issues using service monitoring metrics. Given the complexity and scale of modern cloud systems, this task can be challenging and may require extensive expertise and resources beyond the capacity of individual humans. Some existing methods tackle this problem by analyzing each metric independently to detect anomalies. However, this could incur overwhelming alert storms that are difficult for engineers to diagnose manually. To pursue better performance, not only the temporal patterns of metrics but also the correlation between metrics (i.e., relational patterns) should be considered, which can be formulated as a multivariate metrics anomaly detection problem. However, most of the studies fall short of extracting these two types of features explicitly. Moreover, there exist some unlabeled anomalies mixed in the training data, which may hinder the detection performance. To address these limitations, we propose the Relational- Temporal Anomaly Detection Model (RTAnomaly) that combines the relational and temporal information of metrics. RTAnomaly employs a graph attention layer to learn the dependencies among metrics, which will further help pinpoint the anomalous metrics that may cause the anomaly effectively. In addition, we exploit the concept of positive unlabeled learning to address the issue of potential anomalies in the training data. To evaluate our method, we conduct experiments on a public dataset and two industrial datasets. RTAnomaly outperforms all the baseline models by achieving an average F1 score of 0.929 and Hit@3 of 0.920, demonstrating its superiority.

Related papers

ADer: A Comprehensive Benchmark for Multi-class Visual Anomaly Detection [52.228708947607636]
This paper proposes a comprehensive visual anomaly detection benchmark, textbftextitADer, which is a modular framework for new anomaly detection methods. The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics. We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z)
Anomaly Detection by Context Contrasting [57.695202846009714]
Anomaly Detection focuses on identifying samples that deviate from the norm. Recent advances in self-supervised learning have shown great promise in this regard. We propose Con2, which addresses this problem by setting normal training data into distinct contexts. Our approach achieves state-of-the-art performance on various benchmarks while exhibiting superior performance in a more realistic healthcare setting.
arXiv Detail & Related papers (2024-05-29T07:59:06Z)
Unraveling the "Anomaly" in Time Series Anomaly Detection: A Self-supervised Tri-domain Solution [89.16750999704969]
Anomaly labels hinder traditional supervised models in time series anomaly detection. Various SOTA deep learning techniques, such as self-supervised learning, have been introduced to tackle this issue. We propose a novel self-supervised learning based Tri-domain Anomaly Detector (TriAD)
arXiv Detail & Related papers (2023-11-19T05:37:18Z)
Self-supervised Learning for Anomaly Detection in Computational Workflows [10.39119516144685]
We introduce an autoencoder-driven self-supervised learning(SSL) approach that learns a summary statistic from unlabeled workflow data. In this approach, we combine generative and contrastive learning objectives to detect outliers in the summary statistics. We demonstrate that by estimating the distribution of normal behavior in the latent space, we can outperform state-of-the-art anomaly detection methods on our benchmark datasets.
arXiv Detail & Related papers (2023-10-02T14:31:56Z)
Practical Anomaly Detection over Multivariate Monitoring Metrics for Online Services [29.37493773435177]
CMAnomaly is an anomaly detection framework on multivariate monitoring metrics based on collaborative machine. The proposed framework is extensively evaluated with both public data and industrial data collected from a large-scale online service system of Huawei Cloud. Compared with state-of-the-art baseline models, CMAnomaly achieves an average F1 score of 0.9494, outperforming baselines by 6.77% to 10.68%, and runs 10X to 20X faster.
arXiv Detail & Related papers (2023-08-19T08:08:05Z)
WePaMaDM-Outlier Detection: Weighted Outlier Detection using Pattern Approaches for Mass Data Mining [0.6754597324022876]
Outlier detection can reveal vital information about system faults, fraudulent activities, and patterns in the data. This article proposed the WePaMaDM-Outlier Detection with distinct mass data mining domain. It also investigates the significance of data modeling in outlier detection techniques in surveillance, fault detection, and trend analysis.
arXiv Detail & Related papers (2023-06-09T07:00:00Z)
PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows. Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z)
TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data [13.864161788250856]
TranAD is a deep transformer network based anomaly detection and diagnosis model. It uses attention-based sequence encoders to swiftly perform inference with the knowledge of the broader temporal trends in the data. TranAD can outperform state-of-the-art baseline methods in detection and diagnosis performance with data and time-efficient training.
arXiv Detail & Related papers (2022-01-18T19:41:29Z)
Graph Neural Network-Based Anomaly Detection in Multivariate Time Series [17.414474298706416]
We develop a new way to detect anomalies in high-dimensional time series data. Our approach combines a structure learning approach with graph neural networks. We show that our method detects anomalies more accurately than baseline approaches.
arXiv Detail & Related papers (2021-06-13T09:07:30Z)
TELESTO: A Graph Neural Network Model for Anomaly Classification in Cloud Services [77.454688257702]
Machine learning (ML) and artificial intelligence (AI) are applied on IT system operation and maintenance. One direction aims at the recognition of re-occurring anomaly types to enable remediation automation. We propose a method that is invariant to dimensionality changes of given data.
arXiv Detail & Related papers (2021-02-25T14:24:49Z)
TadGAN: Time Series Anomaly Detection Using Generative Adversarial Networks [73.01104041298031]
TadGAN is an unsupervised anomaly detection approach built on Generative Adversarial Networks (GANs) To capture the temporal correlations of time series, we use LSTM Recurrent Neural Networks as base models for Generators and Critics. To demonstrate the performance and generalizability of our approach, we test several anomaly scoring techniques and report the best-suited one.
arXiv Detail & Related papers (2020-09-16T15:52:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.