How Industry Tackles Anomalies during Runtime: Approaches and Key Monitoring Parameters
- URL: http://arxiv.org/abs/2408.07816v1
- Date: Wed, 14 Aug 2024 21:10:15 GMT
- Title: How Industry Tackles Anomalies during Runtime: Approaches and Key Monitoring Parameters
- Authors: Monika Steidl, Benedikt Dornauer, Michael Felderer, Rudolf Ramler, Mircea-Cristian Racasan, Marko Gattringer,
- Abstract summary: This paper seeks to comprehend anomalies and current anomaly detection approaches across diverse industrial sectors.
It also aims to pinpoint the parameters necessary for identifying anomalies via runtime monitoring data.
- Score: 4.041882008624403
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deviations from expected behavior during runtime, known as anomalies, have become more common due to the systems' complexity, especially for microservices. Consequently, analyzing runtime monitoring data, such as logs, traces for microservices, and metrics, is challenging due to the large volume of data collected. Developing effective rules or AI algorithms requires a deep understanding of this data to reliably detect unforeseen anomalies. This paper seeks to comprehend anomalies and current anomaly detection approaches across diverse industrial sectors. Additionally, it aims to pinpoint the parameters necessary for identifying anomalies via runtime monitoring data. Therefore, we conducted semi-structured interviews with fifteen industry participants who rely on anomaly detection during runtime. Additionally, to supplement information from the interviews, we performed a literature review focusing on anomaly detection approaches applied to industrial real-life datasets. Our paper (1) demonstrates the diversity of interpretations and examples of software anomalies during runtime and (2) explores the reasons behind choosing rule-based approaches in the industry over self-developed AI approaches. AI-based approaches have become prominent in published industry-related papers in the last three years. Furthermore, we (3) identified key monitoring parameters collected during runtime (logs, traces, and metrics) that assist practitioners in detecting anomalies during runtime without introducing bias in their anomaly detection approach due to inconclusive parameters.
Related papers
- A Survey of Time Series Anomaly Detection Methods in the AIOps Domain [16.92261613814882]
Internet-based services have seen remarkable success, generating vast amounts of monitored key performance indicators (KPIs)
This review offers a comprehensive overview of time series anomaly detection in Artificial Intelligence for IT operations (AIOps)
It explores future directions for real-world and next-generation time-series anomaly detection based on recent advancements.
arXiv Detail & Related papers (2023-08-01T09:13:57Z) - Precursor-of-Anomaly Detection for Irregular Time Series [31.73234935455713]
We present a novel type of anomaly detection, called Precursor-of-Anomaly (PoA) detection.
To solve both problems at the same time, we present a neural controlled differential equation-based neural network and its multi-task learning algorithm.
arXiv Detail & Related papers (2023-06-27T14:10:09Z) - Interactive System-wise Anomaly Detection [66.3766756452743]
Anomaly detection plays a fundamental role in various applications.
It is challenging for existing methods to handle the scenarios where the instances are systems whose characteristics are not readily observed as data.
We develop an end-to-end approach which includes an encoder-decoder module that learns system embeddings.
arXiv Detail & Related papers (2023-04-21T02:20:24Z) - From Explanation to Action: An End-to-End Human-in-the-loop Framework
for Anomaly Reasoning and Management [15.22568616519016]
We introduce ALARM, an end-to-end framework that supports the anomaly mining cycle comprehensively.
It offers anomaly explanations and an interactive GUI for human-in-the-loop processes.
We demonstrate ALARM's efficacy through a series of case studies with fraud analysts from the financial industry.
arXiv Detail & Related papers (2023-04-06T20:49:36Z) - Real-Time Outlier Detection with Dynamic Process Limits [0.609170287691728]
This paper proposes an online anomaly detection algorithm for existing real-time infrastructures.
Online inverse cumulative distribution-based approach is introduced to eliminate common problems of offline anomaly detectors.
The benefit of the proposed method is the ease of use, fast computation, and deployability as shown in two case studies of real microgrid operation data.
arXiv Detail & Related papers (2023-01-31T10:23:02Z) - PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows.
Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z) - Deep Learning for Time Series Anomaly Detection: A Survey [53.83593870825628]
Time series anomaly detection has applications in a wide range of research fields and applications, including manufacturing and healthcare.
The large size and complex patterns of time series have led researchers to develop specialised deep learning models for detecting anomalous patterns.
This survey focuses on providing structured and comprehensive state-of-the-art time series anomaly detection models through the use of deep learning.
arXiv Detail & Related papers (2022-11-09T22:40:22Z) - Causality-Based Multivariate Time Series Anomaly Detection [63.799474860969156]
We formulate the anomaly detection problem from a causal perspective and view anomalies as instances that do not follow the regular causal mechanism to generate the multivariate data.
We then propose a causality-based anomaly detection approach, which first learns the causal structure from data and then infers whether an instance is an anomaly relative to the local causal mechanism.
We evaluate our approach with both simulated and public datasets as well as a case study on real-world AIOps applications.
arXiv Detail & Related papers (2022-06-30T06:00:13Z) - Functional Anomaly Detection: a Benchmark Study [4.444788548423704]
Anomaly detection can now rely on measurements sampled at a very high frequency.
It is the purpose of this paper to investigate the performance of recent techniques for anomaly detection in the functional setup on real datasets.
arXiv Detail & Related papers (2022-01-13T18:20:32Z) - A2Log: Attentive Augmented Log Anomaly Detection [53.06341151551106]
Anomaly detection becomes increasingly important for the dependability and serviceability of IT services.
Existing unsupervised methods need anomaly examples to obtain a suitable decision boundary.
We develop A2Log, which is an unsupervised anomaly detection method consisting of two steps: Anomaly scoring and anomaly decision.
arXiv Detail & Related papers (2021-09-20T13:40:21Z) - TadGAN: Time Series Anomaly Detection Using Generative Adversarial
Networks [73.01104041298031]
TadGAN is an unsupervised anomaly detection approach built on Generative Adversarial Networks (GANs)
To capture the temporal correlations of time series, we use LSTM Recurrent Neural Networks as base models for Generators and Critics.
To demonstrate the performance and generalizability of our approach, we test several anomaly scoring techniques and report the best-suited one.
arXiv Detail & Related papers (2020-09-16T15:52:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.