Related papers: Robust and Transferable Anomaly Detection in Log Data using Pre-Trained Language Models

Robust and Transferable Anomaly Detection in Log Data using Pre-Trained Language Models

URL: http://arxiv.org/abs/2102.11570v1
Date: Tue, 23 Feb 2021 09:17:05 GMT
Title: Robust and Transferable Anomaly Detection in Log Data using Pre-Trained Language Models
Authors: Harold Ott, Jasmin Bogatinovski, Alexander Acker, Sasho Nedelkoski, Odej Kao
Abstract summary: Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users. We propose a framework for anomaly detection in log data, as a major troubleshooting source of system information.
Score: 59.04636530383049
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users that communicate, compute, and store information. Therefore, timely and accurate anomaly detection is necessary for reliability, security, safe operation, and mitigation of losses in these increasingly important systems. Recently, the evolution of the software industry opens up several problems that need to be tackled including (1) addressing the software evolution due software upgrades, and (2) solving the cold-start problem, where data from the system of interest is not available. In this paper, we propose a framework for anomaly detection in log data, as a major troubleshooting source of system information. To that end, we utilize pre-trained general-purpose language models to preserve the semantics of log messages and map them into log vector embeddings. The key idea is that these representations for the logs are robust and less invariant to changes in the logs, and therefore, result in a better generalization of the anomaly detection models. We perform several experiments on a cloud dataset evaluating different language models for obtaining numerical log representations such as BERT, GPT-2, and XL. The robustness is evaluated by gradually altering log messages, to simulate a change in semantics. Our results show that the proposed approach achieves high performance and robustness, which opens up possibilities for future research in this direction.

Related papers

Anomaly Detection on Unstable Logs with GPT Models [1.9713190626298576]
This paper reports on an experimental comparison of a fine-tuned LLM and alternative models for anomaly detection on unstable logs. The pre-training of LLMs on vast datasets may enable a robust understanding of diverse patterns and contextual information. The difference between GPT-3 and other supervised approaches tends to become more significant as the degree of changes in log sequences increases.
arXiv Detail & Related papers (2024-06-11T17:13:18Z)
LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection [73.69399219776315]
We propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains. Specifically, our model is first pre-trained on the source domain to obtain shared semantic knowledge of log data. Then, we transfer such knowledge to the target domain via shared parameters.
arXiv Detail & Related papers (2024-01-09T12:55:21Z)
LogGPT: Log Anomaly Detection via GPT [15.790373280124196]
We propose LogGPT, a novel framework that employs GPT for log anomaly detection. LogGPT is first trained to predict the next log entry based on the preceding sequence. We propose a novel reinforcement learning strategy to finetune the model specifically for the log anomaly detection task.
arXiv Detail & Related papers (2023-09-25T19:29:50Z)
EvLog: Identifying Anomalous Logs over Software Evolution [31.46106509190191]
We propose a novel unsupervised approach named Evolving Log extractor (EvLog) to process logs without parsing. EvLog implements an anomaly discriminator with an attention mechanism to identify the anomalous logs and avoid the issue brought by the unstable sequence. EvLog has shown effectiveness in two real-world system evolution log datasets with an average F1 score of 0.955 and 0.847 in the intra-version setting and inter-version setting, respectively.
arXiv Detail & Related papers (2023-06-02T12:58:00Z)
PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows. Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z)
LAnoBERT: System Log Anomaly Detection based on BERT Masked Language Model [12.00171674362062]
The aim of system log anomaly detection is to promptly identify anomalies while minimizing human intervention. Previous studies performed anomaly detection through algorithms after converting various forms of log data into a standardized template. In this study, we propose LAnoBERT, exhibiting excellent natural language processing performance.
arXiv Detail & Related papers (2021-11-18T07:46:35Z)
LogLAB: Attention-Based Labeling of Log Data Anomalies via Weak Supervision [63.08516384181491]
We present LogLAB, a novel modeling approach for automated labeling of log messages without requiring manual work by experts. Our method relies on estimated failure time windows provided by monitoring systems to produce precise labeled datasets in retrospect. Our evaluation shows that LogLAB consistently outperforms nine benchmark approaches across three different datasets and maintains an F1-score of more than 0.98 even at large failure time windows.
arXiv Detail & Related papers (2021-11-02T15:16:08Z)
Log-based Anomaly Detection Without Log Parsing [7.66638994053231]
We propose NeuralLog, a novel log-based anomaly detection approach that does not require log parsing. Our experimental results show that the proposed approach can effectively understand the semantic meaning of log messages. Overall, NeuralLog achieves F1-scores greater than 0.95 on four public datasets, outperforming the existing approaches.
arXiv Detail & Related papers (2021-08-04T10:42:13Z)
TELESTO: A Graph Neural Network Model for Anomaly Classification in Cloud Services [77.454688257702]
Machine learning (ML) and artificial intelligence (AI) are applied on IT system operation and maintenance. One direction aims at the recognition of re-occurring anomaly types to enable remediation automation. We propose a method that is invariant to dimensionality changes of given data.
arXiv Detail & Related papers (2021-02-25T14:24:49Z)
Self-Attentive Classification-Based Anomaly Detection in Unstructured Logs [59.04636530383049]
We propose Logsy, a classification-based method to learn log representations. We show an average improvement of 0.25 in the F1 score, compared to the previous methods.
arXiv Detail & Related papers (2020-08-21T07:26:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.