Impact of Log Parsing on Deep Learning-Based Anomaly Detection
- URL: http://arxiv.org/abs/2305.15897v4
- Date: Mon, 19 Aug 2024 10:55:08 GMT
- Title: Impact of Log Parsing on Deep Learning-Based Anomaly Detection
- Authors: Zanis Ali Khan, Donghwan Shin, Domenico Bianculli, Lionel Briand,
- Abstract summary: We show that there is no strong correlation between log parsing accuracy and anomaly detection accuracy.
We experimentally confirm existing theoretical results showing that it is a property that we refer to as distinguishability in log parsing results.
- Score: 4.0719622481627376
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Software systems log massive amounts of data, recording important runtime information. Such logs are used, for example, for log-based anomaly detection, which aims to automatically detect abnormal behaviors of the system under analysis by processing the information recorded in its logs. Many log-based anomaly detection techniques based on deep learning models include a pre-processing step called log parsing. However, understanding the impact of log parsing on the accuracy of anomaly detection techniques has received surprisingly little attention so far. Investigating what are the key properties log parsing techniques should ideally have to help anomaly detection is therefore warranted. In this paper, we report on a comprehensive empirical study on the impact of log parsing on anomaly detection accuracy, using 13 log parsing techniques, seven anomaly detection techniques (five based on deep learning and two based on traditional machine learning) on three publicly available log datasets. Our empirical results show that, despite what is widely assumed, there is no strong correlation between log parsing accuracy and anomaly detection accuracy, regardless of the metric used for measuring log parsing accuracy. Moreover, we experimentally confirm existing theoretical results showing that it is a property that we refer to as distinguishability in log parsing results as opposed to their accuracy that plays an essential role in achieving accurate anomaly detection.
Related papers
- Log2graphs: An Unsupervised Framework for Log Anomaly Detection with Efficient Feature Extraction [1.474723404975345]
High cost of manual annotation and dynamic nature of usage scenarios present major challenges to effective log analysis.
This study proposes a novel log feature extraction model called DualGCN-LogAE, designed to adapt to various scenarios.
We also introduce Log2graphs, an unsupervised log anomaly detection method based on the feature extractor.
arXiv Detail & Related papers (2024-09-18T11:35:58Z) - Reducing Events to Augment Log-based Anomaly Detection Models: An Empirical Study [8.110288854047417]
We propose LogCleaner: an efficient methodology for the automatic reduction of log events in the context of anomaly detection.
Experimental outcomes highlight LogCleaner's capability to reduce over 70% of log events in anomaly detection, accelerating the model's inference speed by approximately 300%.
arXiv Detail & Related papers (2024-09-07T14:02:02Z) - LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection [73.69399219776315]
We propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains.
Specifically, our model is first pre-trained on the source domain to obtain shared semantic knowledge of log data.
Then, we transfer such knowledge to the target domain via shared parameters.
arXiv Detail & Related papers (2024-01-09T12:55:21Z) - GLAD: Content-aware Dynamic Graphs For Log Anomaly Detection [49.9884374409624]
GLAD is a Graph-based Log Anomaly Detection framework designed to detect anomalies in system logs.
We introduce GLAD, a Graph-based Log Anomaly Detection framework designed to detect anomalies in system logs.
arXiv Detail & Related papers (2023-09-12T04:21:30Z) - PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows.
Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z) - Feature Selection for Fault Detection and Prediction based on Event Log
Analysis [14.80211278818555]
Event logs are widely used for anomaly detection and prediction in complex systems.
We develop a feature selection method for log-based anomaly detection and prediction, largely improving the effectiveness and efficiency.
arXiv Detail & Related papers (2022-08-19T16:43:37Z) - LogLAB: Attention-Based Labeling of Log Data Anomalies via Weak
Supervision [63.08516384181491]
We present LogLAB, a novel modeling approach for automated labeling of log messages without requiring manual work by experts.
Our method relies on estimated failure time windows provided by monitoring systems to produce precise labeled datasets in retrospect.
Our evaluation shows that LogLAB consistently outperforms nine benchmark approaches across three different datasets and maintains an F1-score of more than 0.98 even at large failure time windows.
arXiv Detail & Related papers (2021-11-02T15:16:08Z) - Log-based Anomaly Detection Without Log Parsing [7.66638994053231]
We propose NeuralLog, a novel log-based anomaly detection approach that does not require log parsing.
Our experimental results show that the proposed approach can effectively understand the semantic meaning of log messages.
Overall, NeuralLog achieves F1-scores greater than 0.95 on four public datasets, outperforming the existing approaches.
arXiv Detail & Related papers (2021-08-04T10:42:13Z) - Robust and Transferable Anomaly Detection in Log Data using Pre-Trained
Language Models [59.04636530383049]
Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users.
We propose a framework for anomaly detection in log data, as a major troubleshooting source of system information.
arXiv Detail & Related papers (2021-02-23T09:17:05Z) - Self-Attentive Classification-Based Anomaly Detection in Unstructured
Logs [59.04636530383049]
We propose Logsy, a classification-based method to learn log representations.
We show an average improvement of 0.25 in the F1 score, compared to the previous methods.
arXiv Detail & Related papers (2020-08-21T07:26:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.