Challenges and Solutions to Build a Data Pipeline to Identify Anomalies
in Enterprise System Performance
- URL: http://arxiv.org/abs/2112.08940v1
- Date: Mon, 13 Dec 2021 22:30:07 GMT
- Title: Challenges and Solutions to Build a Data Pipeline to Identify Anomalies
in Enterprise System Performance
- Authors: Xiaobo Huang, Amitabha Banerjee, Chien-Chia Chen, Chengzhi Huang, Tzu
Yi Chuang, Abhishek Srivastava, Razvan Cheveresan
- Abstract summary: We discuss challenges to harness data to operate our ML-based anomaly detection system.
We demonstrate that by addressing these data challenges, we not only improve the accuracy of our performance anomaly detection model by 30%, but also ensure that the model performance to never degrade over time.
- Score: 3.037408957267527
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We discuss how VMware is solving the following challenges to harness data to
operate our ML-based anomaly detection system to detect performance issues in
our Software Defined Data Center (SDDC) enterprise deployments: (i) label
scarcity and label bias due to heavy dependency on unscalable human annotators,
and (ii) data drifts due to ever-changing workload patterns, software stack and
underlying hardware. Our anomaly detection system has been deployed in
production for many years and has successfully detected numerous major
performance issues. We demonstrate that by addressing these data challenges, we
not only improve the accuracy of our performance anomaly detection model by
30%, but also ensure that the model performance to never degrade over time.
Related papers
- A Transfer Learning Framework for Anomaly Detection in Multivariate IoT Traffic Data [6.229535970620059]
We propose a transfer learning-based model for anomaly detection in time-series datasets.
Unlike conventional methods, our approach does not require labeled data in either the source or target domains.
Empirical evaluations on novel intrusion detection datasets demonstrate that our model outperforms existing techniques.
arXiv Detail & Related papers (2025-01-26T02:03:49Z) - Characterization of Large Language Model Development in the Datacenter [55.9909258342639]
Large Language Models (LLMs) have presented impressive performance across several transformative tasks.
However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs.
We present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme.
arXiv Detail & Related papers (2024-03-12T13:31:14Z) - Robust Multimodal Failure Detection for Microservice Systems [32.25907616511765]
AnoFusion is an unsupervised failure detection approach for microservice systems.
It learns the correlation of the heterogeneous multimodal data and integrates a Graph Attention Network (GAT) and Gated Recurrent Unit (GRU)
It achieves the F1-score of 0.857 and 0.922, respectively, outperforming state-of-the-art failure detection approaches.
arXiv Detail & Related papers (2023-05-30T12:39:42Z) - PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows.
Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z) - An Outlier Exposure Approach to Improve Visual Anomaly Detection
Performance for Mobile Robots [76.36017224414523]
We consider the problem of building visual anomaly detection systems for mobile robots.
Standard anomaly detection models are trained using large datasets composed only of non-anomalous data.
We tackle the problem of exploiting these data to improve the performance of a Real-NVP anomaly detection model.
arXiv Detail & Related papers (2022-09-20T15:18:13Z) - DAE : Discriminatory Auto-Encoder for multivariate time-series anomaly
detection in air transportation [68.8204255655161]
We propose a novel anomaly detection model called Discriminatory Auto-Encoder (DAE)
It uses the baseline of a regular LSTM-based auto-encoder but with several decoders, each getting data of a specific flight phase.
Results show that the DAE achieves better results in both accuracy and speed of detection.
arXiv Detail & Related papers (2021-09-08T14:07:55Z) - Anomaly Detection Based on Selection and Weighting in Latent Space [73.01328671569759]
We propose a novel selection-and-weighting-based anomaly detection framework called SWAD.
Experiments on both benchmark and real-world datasets have shown the effectiveness and superiority of SWAD.
arXiv Detail & Related papers (2021-03-08T10:56:38Z) - Robust and Transferable Anomaly Detection in Log Data using Pre-Trained
Language Models [59.04636530383049]
Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users.
We propose a framework for anomaly detection in log data, as a major troubleshooting source of system information.
arXiv Detail & Related papers (2021-02-23T09:17:05Z) - Building an Automated and Self-Aware Anomaly Detection System [0.0]
It can be challenging to proactively monitor a large number of diverse and constantly changing time series for anomalies.
Traditionally, variations in the data generation processes and patterns have required strong modeling expertise to create models that accurately flag anomalies.
In this paper, we describe an anomaly detection system that overcomes this common challenge by keeping track of its own performance and making changes as necessary to each model without requiring manual intervention.
arXiv Detail & Related papers (2020-11-10T11:19:07Z) - Automatic Feasibility Study via Data Quality Analysis for ML: A
Case-Study on Label Noise [21.491392581672198]
We present Snoopy, with the goal of supporting data scientists and machine learning engineers performing a systematic and theoretically founded feasibility study.
We approach this problem by estimating the irreducible error of the underlying task, also known as the Bayes error rate (BER)
We demonstrate in end-to-end experiments how users are able to save substantial labeling time and monetary efforts.
arXiv Detail & Related papers (2020-10-16T14:21:19Z) - Out-Of-Bag Anomaly Detection [0.9449650062296822]
Data anomalies are ubiquitous in real world datasets, and can have an adverse impact on machine learning (ML) systems.
We propose a novel model-based anomaly detection method, that we call Out-of-Bag anomaly detection.
We show our method can improve the accuracy and reliability of an ML system as data pre-processing step via a case study on home valuation.
arXiv Detail & Related papers (2020-09-20T06:01:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.