Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset
- URL: http://arxiv.org/abs/2411.09047v1
- Date: Wed, 13 Nov 2024 22:04:19 GMT
- Title: Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset
- Authors: Mohammad Saiful Islam, Mohamed Sami Rakha, William Pourmajidi, Janakan Sivaloganathan, John Steinbacher, Andriy Miranskyy,
- Abstract summary: We introduce a new high-dimensional dataset from IBM Cloud, collected over 4.5 months from the IBM Cloud Console.
This dataset comprises 39,365 rows and 117,448 columns of telemetry data.
We demonstrate the application of machine learning models for anomaly detection and discuss the key challenges faced in this process.
- Score: 1.293050392312921
- License:
- Abstract: As Large-Scale Cloud Systems (LCS) become increasingly complex, effective anomaly detection is critical for ensuring system reliability and performance. However, there is a shortage of large-scale, real-world datasets available for benchmarking anomaly detection methods. To address this gap, we introduce a new high-dimensional dataset from IBM Cloud, collected over 4.5 months from the IBM Cloud Console. This dataset comprises 39,365 rows and 117,448 columns of telemetry data. Additionally, we demonstrate the application of machine learning models for anomaly detection and discuss the key challenges faced in this process. This study and the accompanying dataset provide a resource for researchers and practitioners in cloud system monitoring. It facilitates more efficient testing of anomaly detection methods in real-world data, helping to advance the development of robust solutions to maintain the health and performance of large-scale cloud infrastructures.
Related papers
- Cloud Atlas: Efficient Fault Localization for Cloud Systems using Language Models and Causal Insight [12.272468397322738]
We present Atlas, a novel approach to automatically synthesizing causal graphs for cloud systems.
We evaluate Atlas across a range of fault localization scenarios and demonstrate that Atlas is capable of generating causal graphs in a scalable and generalizable manner.
arXiv Detail & Related papers (2024-07-11T17:31:12Z) - ModelNet-O: A Large-Scale Synthetic Dataset for Occlusion-Aware Point
Cloud Classification [28.05358017259757]
We propose ModelNet-O, a large-scale synthetic dataset of 123,041 samples.
ModelNet-O emulates real-world point clouds with self-occlusion caused by scanning from monocular cameras.
We propose a robust point cloud processing method called PointMLS.
arXiv Detail & Related papers (2024-01-16T08:54:21Z) - A Survey of Label-Efficient Deep Learning for 3D Point Clouds [109.07889215814589]
This paper presents the first comprehensive survey of label-efficient learning of point clouds.
We propose a taxonomy that organizes label-efficient learning methods based on the data prerequisites provided by different types of labels.
For each approach, we outline the problem setup and provide an extensive literature review that showcases relevant progress and challenges.
arXiv Detail & Related papers (2023-05-31T12:54:51Z) - An Outlier Exposure Approach to Improve Visual Anomaly Detection
Performance for Mobile Robots [76.36017224414523]
We consider the problem of building visual anomaly detection systems for mobile robots.
Standard anomaly detection models are trained using large datasets composed only of non-anomalous data.
We tackle the problem of exploiting these data to improve the performance of a Real-NVP anomaly detection model.
arXiv Detail & Related papers (2022-09-20T15:18:13Z) - Exploring Scalable, Distributed Real-Time Anomaly Detection for Bridge
Health Monitoring [15.920402427606959]
Modern real-time Structural Health Monitoring systems can generate a considerable amount of information.
Current cloud-based solutions cannot scale if the raw data has to be collected from thousands of buildings.
This paper presents a full-stack deployment of an efficient and scalable anomaly detection pipeline for SHM systems.
arXiv Detail & Related papers (2022-03-04T15:37:20Z) - Online Self-Evolving Anomaly Detection in Cloud Computing Environments [6.480575492140354]
We present a emphself-evolving anomaly detection (SEAD) framework for cloud dependability assurance.
Our framework self-evolves by exploring newly verified anomaly records and continuously updating the anomaly detector online.
Our detectors can achieve 88.94% in sensitivity and 94.60% on average, which makes them suitable for real-world deployment.
arXiv Detail & Related papers (2021-11-16T05:13:38Z) - Robust and Transferable Anomaly Detection in Log Data using Pre-Trained
Language Models [59.04636530383049]
Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users.
We propose a framework for anomaly detection in log data, as a major troubleshooting source of system information.
arXiv Detail & Related papers (2021-02-23T09:17:05Z) - Data Mining with Big Data in Intrusion Detection Systems: A Systematic
Literature Review [68.15472610671748]
Cloud computing has become a powerful and indispensable technology for complex, high performance and scalable computation.
The rapid rate and volume of data creation has begun to pose significant challenges for data management and security.
The design and deployment of intrusion detection systems (IDS) in the big data setting has, therefore, become a topic of importance.
arXiv Detail & Related papers (2020-05-23T20:57:12Z) - Contextual-Bandit Anomaly Detection for IoT Data in Distributed
Hierarchical Edge Computing [65.78881372074983]
IoT devices can hardly afford complex deep neural networks (DNN) models, and offloading anomaly detection tasks to the cloud incurs long delay.
We propose and build a demo for an adaptive anomaly detection approach for distributed hierarchical edge computing (HEC) systems.
We show that our proposed approach significantly reduces detection delay without sacrificing accuracy, as compared to offloading detection tasks to the cloud.
arXiv Detail & Related papers (2020-04-15T06:13:33Z) - Adaptive Anomaly Detection for IoT Data in Hierarchical Edge Computing [71.86955275376604]
We propose an adaptive anomaly detection approach for hierarchical edge computing (HEC) systems to solve this problem.
We design an adaptive scheme to select one of the models based on the contextual information extracted from input data, to perform anomaly detection.
We evaluate our proposed approach using a real IoT dataset, and demonstrate that it reduces detection delay by 84% while maintaining almost the same accuracy as compared to offloading detection tasks to the cloud.
arXiv Detail & Related papers (2020-01-10T05:29:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.