A Critical Review of Common Log Data Sets Used for Evaluation of
Sequence-based Anomaly Detection Techniques
- URL: http://arxiv.org/abs/2309.02854v1
- Date: Wed, 6 Sep 2023 09:31:17 GMT
- Title: A Critical Review of Common Log Data Sets Used for Evaluation of
Sequence-based Anomaly Detection Techniques
- Authors: Max Landauer and Florian Skopik and Markus Wurzenberger
- Abstract summary: We analyze six publicly available log data sets with focus on the manifestations of anomalies and simple techniques for their detection.
Our findings suggest that most anomalies are not directly related to sequential manifestations and that advanced detection techniques are not required to achieve high detection rates on these data sets.
- Score: 2.5339493426758906
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Log data store event execution patterns that correspond to underlying
workflows of systems or applications. While most logs are informative, log data
also include artifacts that indicate failures or incidents. Accordingly, log
data are often used to evaluate anomaly detection techniques that aim to
automatically disclose unexpected or otherwise relevant system behavior
patterns. Recently, detection approaches leveraging deep learning have
increasingly focused on anomalies that manifest as changes of sequential
patterns within otherwise normal event traces. Several publicly available data
sets, such as HDFS, BGL, Thunderbird, OpenStack, and Hadoop, have since become
standards for evaluating these anomaly detection techniques, however, the
appropriateness of these data sets has not been closely investigated in the
past. In this paper we therefore analyze six publicly available log data sets
with focus on the manifestations of anomalies and simple techniques for their
detection. Our findings suggest that most anomalies are not directly related to
sequential manifestations and that advanced detection techniques are not
required to achieve high detection rates on these data sets.
Related papers
- What Information Contributes to Log-based Anomaly Detection? Insights from a Configurable Transformer-Based Approach [12.980238412281471]
We propose a transformer-based anomaly detection model that can capture semantic, sequential, and temporal information in the log data.
We conduct a series of experiments with different combinations of input features to evaluate the roles of different types of information in anomaly detection.
The results indicate that the event occurrence information plays a key role in identifying anomalies, while the impact of the sequential and temporal information is not significant for anomaly detection in the studied public datasets.
arXiv Detail & Related papers (2024-09-30T17:03:13Z) - Anomaly Detection by Context Contrasting [57.695202846009714]
Anomaly detection focuses on identifying samples that deviate from the norm.
Recent advances in self-supervised learning have shown great promise in this regard.
We propose Con$$, which learns through context augmentations.
arXiv Detail & Related papers (2024-05-29T07:59:06Z) - ARC: A Generalist Graph Anomaly Detector with In-Context Learning [62.202323209244]
ARC is a generalist GAD approach that enables a one-for-all'' GAD model to detect anomalies across various graph datasets on-the-fly.
equipped with in-context learning, ARC can directly extract dataset-specific patterns from the target dataset.
Extensive experiments on multiple benchmark datasets from various domains demonstrate the superior anomaly detection performance, efficiency, and generalizability of ARC.
arXiv Detail & Related papers (2024-05-27T02:42:33Z) - Semi-supervised learning via DQN for log anomaly detection [1.5339370927841764]
Current methods in log anomaly detection face challenges such as underutilization of unlabeled data, imbalance between normal and anomaly class data, and high rates of false positives and false negatives.
We propose a semi-supervised log anomaly detection method named DQNLog, which integrates deep reinforcement learning to enhance anomaly detection performance.
We evaluate DQNLog on three widely used datasets, demonstrating its ability to effectively utilize large-scale unlabeled data.
arXiv Detail & Related papers (2024-01-06T08:04:13Z) - WePaMaDM-Outlier Detection: Weighted Outlier Detection using Pattern
Approaches for Mass Data Mining [0.6754597324022876]
Outlier detection can reveal vital information about system faults, fraudulent activities, and patterns in the data.
This article proposed the WePaMaDM-Outlier Detection with distinct mass data mining domain.
It also investigates the significance of data modeling in outlier detection techniques in surveillance, fault detection, and trend analysis.
arXiv Detail & Related papers (2023-06-09T07:00:00Z) - PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows.
Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z) - A Taxonomy of Anomalies in Log Data [0.09558392439655014]
A common taxonomy for anomalies already exists, but it has not yet been applied specifically to log data.
We present a taxonomy for different kinds of log data anomalies and introduce a method for analyzing such anomalies in labeled datasets.
Our results show, that the most common anomaly type is also the easiest to predict.
arXiv Detail & Related papers (2021-11-26T12:23:06Z) - Toward Deep Supervised Anomaly Detection: Reinforcement Learning from
Partially Labeled Anomaly Data [150.9270911031327]
We consider the problem of anomaly detection with a small set of partially labeled anomaly examples and a large-scale unlabeled dataset.
Existing related methods either exclusively fit the limited anomaly examples that typically do not span the entire set of anomalies, or proceed with unsupervised learning from the unlabeled data.
We propose here instead a deep reinforcement learning-based approach that enables an end-to-end optimization of the detection of both labeled and unlabeled anomalies.
arXiv Detail & Related papers (2020-09-15T03:05:39Z) - A Background-Agnostic Framework with Adversarial Training for Abnormal
Event Detection in Video [120.18562044084678]
Abnormal event detection in video is a complex computer vision problem that has attracted significant attention in recent years.
We propose a background-agnostic framework that learns from training videos containing only normal events.
arXiv Detail & Related papers (2020-08-27T18:39:24Z) - Self-Attentive Classification-Based Anomaly Detection in Unstructured
Logs [59.04636530383049]
We propose Logsy, a classification-based method to learn log representations.
We show an average improvement of 0.25 in the F1 score, compared to the previous methods.
arXiv Detail & Related papers (2020-08-21T07:26:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.