What Information Contributes to Log-based Anomaly Detection? Insights from a Configurable Transformer-Based Approach
- URL: http://arxiv.org/abs/2409.20503v1
- Date: Mon, 30 Sep 2024 17:03:13 GMT
- Title: What Information Contributes to Log-based Anomaly Detection? Insights from a Configurable Transformer-Based Approach
- Authors: Xingfang Wu, Heng Li, Foutse Khomh,
- Abstract summary: We propose a transformer-based anomaly detection model that can capture semantic, sequential, and temporal information in the log data.
We conduct a series of experiments with different combinations of input features to evaluate the roles of different types of information in anomaly detection.
The results indicate that the event occurrence information plays a key role in identifying anomalies, while the impact of the sequential and temporal information is not significant for anomaly detection in the studied public datasets.
- Score: 12.980238412281471
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Log data are generated from logging statements in the source code, providing insights into the execution processes of software applications and systems. State-of-the-art log-based anomaly detection approaches typically leverage deep learning models to capture the semantic or sequential information in the log data and detect anomalous runtime behaviors. However, the impacts of these different types of information are not clear. In addition, existing approaches have not captured the timestamps in the log data, which can potentially provide more fine-grained temporal information than sequential information. In this work, we propose a configurable transformer-based anomaly detection model that can capture the semantic, sequential, and temporal information in the log data and allows us to configure the different types of information as the model's features. Additionally, we train and evaluate the proposed model using log sequences of different lengths, thus overcoming the constraint of existing methods that rely on fixed-length or time-windowed log sequences as inputs. With the proposed model, we conduct a series of experiments with different combinations of input features to evaluate the roles of different types of information in anomaly detection. When presented with log sequences of varying lengths, the model can attain competitive and consistently stable performance compared to the baselines. The results indicate that the event occurrence information plays a key role in identifying anomalies, while the impact of the sequential and temporal information is not significant for anomaly detection in the studied public datasets. On the other hand, the findings also reveal the simplicity of the studied public datasets and highlight the importance of constructing new datasets that contain different types of anomalies to better evaluate the performance of anomaly detection models.
Related papers
- Graph Spatiotemporal Process for Multivariate Time Series Anomaly
Detection with Missing Values [67.76168547245237]
We introduce a novel framework called GST-Pro, which utilizes a graphtemporal process and anomaly scorer to detect anomalies.
Our experimental results show that the GST-Pro method can effectively detect anomalies in time series data and outperforms state-of-the-art methods.
arXiv Detail & Related papers (2024-01-11T10:10:16Z) - GLAD: Content-aware Dynamic Graphs For Log Anomaly Detection [49.9884374409624]
GLAD is a Graph-based Log Anomaly Detection framework designed to detect anomalies in system logs.
We introduce GLAD, a Graph-based Log Anomaly Detection framework designed to detect anomalies in system logs.
arXiv Detail & Related papers (2023-09-12T04:21:30Z) - A Critical Review of Common Log Data Sets Used for Evaluation of
Sequence-based Anomaly Detection Techniques [2.5339493426758906]
We analyze six publicly available log data sets with focus on the manifestations of anomalies and simple techniques for their detection.
Our findings suggest that most anomalies are not directly related to sequential manifestations and that advanced detection techniques are not required to achieve high detection rates on these data sets.
arXiv Detail & Related papers (2023-09-06T09:31:17Z) - WePaMaDM-Outlier Detection: Weighted Outlier Detection using Pattern
Approaches for Mass Data Mining [0.6754597324022876]
Outlier detection can reveal vital information about system faults, fraudulent activities, and patterns in the data.
This article proposed the WePaMaDM-Outlier Detection with distinct mass data mining domain.
It also investigates the significance of data modeling in outlier detection techniques in surveillance, fault detection, and trend analysis.
arXiv Detail & Related papers (2023-06-09T07:00:00Z) - PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows.
Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z) - Causality-Based Multivariate Time Series Anomaly Detection [63.799474860969156]
We formulate the anomaly detection problem from a causal perspective and view anomalies as instances that do not follow the regular causal mechanism to generate the multivariate data.
We then propose a causality-based anomaly detection approach, which first learns the causal structure from data and then infers whether an instance is an anomaly relative to the local causal mechanism.
We evaluate our approach with both simulated and public datasets as well as a case study on real-world AIOps applications.
arXiv Detail & Related papers (2022-06-30T06:00:13Z) - Robust and Transferable Anomaly Detection in Log Data using Pre-Trained
Language Models [59.04636530383049]
Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users.
We propose a framework for anomaly detection in log data, as a major troubleshooting source of system information.
arXiv Detail & Related papers (2021-02-23T09:17:05Z) - Detecting Log Anomalies with Multi-Head Attention (LAMA) [2.0684234025249713]
We propose LAMA, a multi-head attention based sequential model to process log streams as template activity (event) sequences.
A next event prediction task is applied to train the model for anomaly detection.
arXiv Detail & Related papers (2021-01-07T06:15:59Z) - Self-Attentive Classification-Based Anomaly Detection in Unstructured
Logs [59.04636530383049]
We propose Logsy, a classification-based method to learn log representations.
We show an average improvement of 0.25 in the F1 score, compared to the previous methods.
arXiv Detail & Related papers (2020-08-21T07:26:55Z) - Dividing Deep Learning Model for Continuous Anomaly Detection of
Inconsistent ICT Systems [0.0]
We propose an ICT-systems-monitoring method with deep learning models divided based on the correlation of log data.
When some of the log data changes, our method can continue health monitoring with the divided models which are not affected by changes in the log data.
arXiv Detail & Related papers (2020-03-24T11:32:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.