Related papers: Online Self-Evolving Anomaly Detection in Cloud Computing Environments

Online Self-Evolving Anomaly Detection in Cloud Computing Environments

URL: http://arxiv.org/abs/2111.08232v1
Date: Tue, 16 Nov 2021 05:13:38 GMT
Title: Online Self-Evolving Anomaly Detection in Cloud Computing Environments
Authors: Haili Wang, Jingda Guo, Xu Ma, Song Fu, Qing Yang, Yunzhong Xu
Abstract summary: We present a emphself-evolving anomaly detection (SEAD) framework for cloud dependability assurance. Our framework self-evolves by exploring newly verified anomaly records and continuously updating the anomaly detector online. Our detectors can achieve 88.94% in sensitivity and 94.60% on average, which makes them suitable for real-world deployment.
Score: 6.480575492140354
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Modern cloud computing systems contain hundreds to thousands of computing and storage servers. Such a scale, combined with ever-growing system complexity, is causing a key challenge to failure and resource management for dependable cloud computing. Autonomic failure detection is a crucial technique for understanding emergent, cloud-wide phenomena and self-managing cloud resources for system-level dependability assurance. To detect failures, we need to monitor the cloud execution and collect runtime performance data. These data are usually unlabeled, and thus a prior failure history is not always available in production clouds. In this paper, we present a \emph{self-evolving anomaly detection} (SEAD) framework for cloud dependability assurance. Our framework self-evolves by recursively exploring newly verified anomaly records and continuously updating the anomaly detector online. As a distinct advantage of our framework, cloud system administrators only need to check a small number of detected anomalies, and their decisions are leveraged to update the detector. Thus, the detector evolves following the upgrade of system hardware, update of the software stack, and change of user workloads. Moreover, we design two types of detectors, one for general anomaly detection and the other for type-specific anomaly detection. With the help of self-evolving techniques, our detectors can achieve 88.94\% in sensitivity and 94.60\% in specificity on average, which makes them suitable for real-world deployment.

Related papers

Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset [1.293050392312921]
We introduce a new high-dimensional dataset from IBM Cloud, collected over 4.5 months from the IBM Cloud Console. This dataset comprises 39,365 rows and 117,448 columns of telemetry data. We demonstrate the application of machine learning models for anomaly detection and discuss the key challenges faced in this process.
arXiv Detail & Related papers (2024-11-13T22:04:19Z)
MELODY: Robust Semi-Supervised Hybrid Model for Entity-Level Online Anomaly Detection with Multivariate Time Series [11.754433499581879]
A faulty code change may degrade the target service's performance and cause cascading outages in downstream services. In this paper, we study the problem of anomaly detection for deployments. We propose a novel framework, semi-supervised hybrid Model for Entity-Level Online Detection of anomalY (MELODY)
arXiv Detail & Related papers (2024-01-18T19:02:41Z)
MoniLog: An Automated Log-Based Anomaly Detection System for Cloud Computing Infrastructures [3.04585143845864]
MoniLog is a distributed approach to detect real-time anomalies within large-scale environments. It aims to detect sequential and quantitative anomalies within a multi-source log stream.
arXiv Detail & Related papers (2023-04-24T09:21:52Z)
Self-Supervised Masked Convolutional Transformer Block for Anomaly Detection [122.4894940892536]
We present a novel self-supervised masked convolutional transformer block (SSMCTB) that comprises the reconstruction-based functionality at a core architectural level. In this work, we extend our previous self-supervised predictive convolutional attentive block (SSPCAB) with a 3D masked convolutional layer, a transformer for channel-wise attention, as well as a novel self-supervised objective based on Huber loss.
arXiv Detail & Related papers (2022-09-25T04:56:10Z)
A Robust and Explainable Data-Driven Anomaly Detection Approach For Power Electronics [56.86150790999639]
We present two anomaly detection and classification approaches, namely the Matrix Profile algorithm and anomaly transformer. The Matrix Profile algorithm is shown to be well suited as a generalizable approach for detecting real-time anomalies in streaming time-series data. A series of custom filters is created and added to the detector to tune its sensitivity, recall, and detection accuracy.
arXiv Detail & Related papers (2022-09-23T06:09:35Z)
CloudShield: Real-time Anomaly Detection in the Cloud [8.406912571507569]
CloudShield is a real-time anomaly and attack detection system for cloud computing. It distinguishes between benign programs, known attacks, and zero-day attacks. It significantly reduces false alarms by up to 99.0%.
arXiv Detail & Related papers (2021-08-20T03:14:18Z)
Learning Dependencies in Distributed Cloud Applications to Identify and Localize Anomalies [58.88325379746632]
We present Arvalus and its variant D-Arvalus, a neural graph transformation method that models system components as nodes and their dependencies as edges to improve the identification and localization of anomalies. Given a series of metric, our method predicts the most likely system state - either normal or an anomaly class - and performs localization when an anomaly is detected. The evaluation shows the generally good prediction performance of Arvalus and reveals the advantage of D-Arvalus which incorporates information about system component dependencies.
arXiv Detail & Related papers (2021-03-09T06:34:05Z)
TELESTO: A Graph Neural Network Model for Anomaly Classification in Cloud Services [77.454688257702]
Machine learning (ML) and artificial intelligence (AI) are applied on IT system operation and maintenance. One direction aims at the recognition of re-occurring anomaly types to enable remediation automation. We propose a method that is invariant to dimensionality changes of given data.
arXiv Detail & Related papers (2021-02-25T14:24:49Z)
Robust and Transferable Anomaly Detection in Log Data using Pre-Trained Language Models [59.04636530383049]
Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users. We propose a framework for anomaly detection in log data, as a major troubleshooting source of system information.
arXiv Detail & Related papers (2021-02-23T09:17:05Z)
Anomaly Detection in a Large-scale Cloud Platform [9.283888139549067]
Cloud computing is ubiquitous: more and more companies are moving the workloads into the Cloud. Service providers need to monitor the quality of their ever-growing offerings effectively. We designed and implemented an automated monitoring system for the IBM Cloud Platform.
arXiv Detail & Related papers (2020-10-21T12:58:36Z)
Adaptive Anomaly Detection for IoT Data in Hierarchical Edge Computing [71.86955275376604]
We propose an adaptive anomaly detection approach for hierarchical edge computing (HEC) systems to solve this problem. We design an adaptive scheme to select one of the models based on the contextual information extracted from input data, to perform anomaly detection. We evaluate our proposed approach using a real IoT dataset, and demonstrate that it reduces detection delay by 84% while maintaining almost the same accuracy as compared to offloading detection tasks to the cloud.
arXiv Detail & Related papers (2020-01-10T05:29:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.