A Critical Review of Common Log Data Sets Used for Evaluation of
Sequence-based Anomaly Detection Techniques
- URL: http://arxiv.org/abs/2309.02854v1
- Date: Wed, 6 Sep 2023 09:31:17 GMT
- Title: A Critical Review of Common Log Data Sets Used for Evaluation of
Sequence-based Anomaly Detection Techniques
- Authors: Max Landauer and Florian Skopik and Markus Wurzenberger
- Abstract summary: We analyze six publicly available log data sets with focus on the manifestations of anomalies and simple techniques for their detection.
Our findings suggest that most anomalies are not directly related to sequential manifestations and that advanced detection techniques are not required to achieve high detection rates on these data sets.
- Score: 2.5339493426758906
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Log data store event execution patterns that correspond to underlying
workflows of systems or applications. While most logs are informative, log data
also include artifacts that indicate failures or incidents. Accordingly, log
data are often used to evaluate anomaly detection techniques that aim to
automatically disclose unexpected or otherwise relevant system behavior
patterns. Recently, detection approaches leveraging deep learning have
increasingly focused on anomalies that manifest as changes of sequential
patterns within otherwise normal event traces. Several publicly available data
sets, such as HDFS, BGL, Thunderbird, OpenStack, and Hadoop, have since become
standards for evaluating these anomaly detection techniques, however, the
appropriateness of these data sets has not been closely investigated in the
past. In this paper we therefore analyze six publicly available log data sets
with focus on the manifestations of anomalies and simple techniques for their
detection. Our findings suggest that most anomalies are not directly related to
sequential manifestations and that advanced detection techniques are not
required to achieve high detection rates on these data sets.
Related papers
- Anomaly Detection by Context Contrasting [57.695202846009714]
Anomaly Detection focuses on identifying samples that deviate from the norm.
Recent advances in self-supervised learning have shown great promise in this regard.
We propose Con2, which addresses this problem by setting normal training data into distinct contexts.
Our approach achieves state-of-the-art performance on various benchmarks while exhibiting superior performance in a more realistic healthcare setting.
arXiv Detail & Related papers (2024-05-29T07:59:06Z) - ARC: A Generalist Graph Anomaly Detector with In-Context Learning [62.202323209244]
ARC is a generalist GAD approach that enables a one-for-all'' GAD model to detect anomalies across various graph datasets on-the-fly.
equipped with in-context learning, ARC can directly extract dataset-specific patterns from the target dataset.
Extensive experiments on multiple benchmark datasets from various domains demonstrate the superior anomaly detection performance, efficiency, and generalizability of ARC.
arXiv Detail & Related papers (2024-05-27T02:42:33Z) - LogELECTRA: Self-supervised Anomaly Detection for Unstructured Logs [0.0]
The goal of log-based anomaly detection is to automatically detect system anomalies by analyzing the large number of logs generated in a short period of time.
Previous studies have used a log to extract templates from unstructured log data and detect anomalies on the basis of patterns of the template occurrences.
We propose LogELECTRA, a new log anomaly detection model that analyzes a single line of log messages more deeply on the basis of self-supervised anomaly detection.
arXiv Detail & Related papers (2024-02-16T01:47:02Z) - Semi-supervised learning via DQN for log anomaly detection [1.5339370927841764]
We propose a semi-supervised log anomaly detection method that combines the DQN algorithm from deep reinforcement learning, which is called DQNLog.
Our evaluation on three widely-used datasets demonstrates that DQNLog significantly improves recall rate and F1-score while maintaining precision, validating its practicality.
arXiv Detail & Related papers (2024-01-06T08:04:13Z) - WePaMaDM-Outlier Detection: Weighted Outlier Detection using Pattern
Approaches for Mass Data Mining [0.6754597324022876]
Outlier detection can reveal vital information about system faults, fraudulent activities, and patterns in the data.
This article proposed the WePaMaDM-Outlier Detection with distinct mass data mining domain.
It also investigates the significance of data modeling in outlier detection techniques in surveillance, fault detection, and trend analysis.
arXiv Detail & Related papers (2023-06-09T07:00:00Z) - PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows.
Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z) - A Taxonomy of Anomalies in Log Data [0.09558392439655014]
A common taxonomy for anomalies already exists, but it has not yet been applied specifically to log data.
We present a taxonomy for different kinds of log data anomalies and introduce a method for analyzing such anomalies in labeled datasets.
Our results show, that the most common anomaly type is also the easiest to predict.
arXiv Detail & Related papers (2021-11-26T12:23:06Z) - Toward Deep Supervised Anomaly Detection: Reinforcement Learning from
Partially Labeled Anomaly Data [150.9270911031327]
We consider the problem of anomaly detection with a small set of partially labeled anomaly examples and a large-scale unlabeled dataset.
Existing related methods either exclusively fit the limited anomaly examples that typically do not span the entire set of anomalies, or proceed with unsupervised learning from the unlabeled data.
We propose here instead a deep reinforcement learning-based approach that enables an end-to-end optimization of the detection of both labeled and unlabeled anomalies.
arXiv Detail & Related papers (2020-09-15T03:05:39Z) - A Background-Agnostic Framework with Adversarial Training for Abnormal
Event Detection in Video [120.18562044084678]
Abnormal event detection in video is a complex computer vision problem that has attracted significant attention in recent years.
We propose a background-agnostic framework that learns from training videos containing only normal events.
arXiv Detail & Related papers (2020-08-27T18:39:24Z) - Self-Attentive Classification-Based Anomaly Detection in Unstructured
Logs [59.04636530383049]
We propose Logsy, a classification-based method to learn log representations.
We show an average improvement of 0.25 in the F1 score, compared to the previous methods.
arXiv Detail & Related papers (2020-08-21T07:26:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.