Robust and Transferable Anomaly Detection in Log Data using Pre-Trained
Language Models
- URL: http://arxiv.org/abs/2102.11570v1
- Date: Tue, 23 Feb 2021 09:17:05 GMT
- Title: Robust and Transferable Anomaly Detection in Log Data using Pre-Trained
Language Models
- Authors: Harold Ott, Jasmin Bogatinovski, Alexander Acker, Sasho Nedelkoski,
Odej Kao
- Abstract summary: Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users.
We propose a framework for anomaly detection in log data, as a major troubleshooting source of system information.
- Score: 59.04636530383049
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Anomalies or failures in large computer systems, such as the cloud, have an
impact on a large number of users that communicate, compute, and store
information. Therefore, timely and accurate anomaly detection is necessary for
reliability, security, safe operation, and mitigation of losses in these
increasingly important systems. Recently, the evolution of the software
industry opens up several problems that need to be tackled including (1)
addressing the software evolution due software upgrades, and (2) solving the
cold-start problem, where data from the system of interest is not available. In
this paper, we propose a framework for anomaly detection in log data, as a
major troubleshooting source of system information. To that end, we utilize
pre-trained general-purpose language models to preserve the semantics of log
messages and map them into log vector embeddings. The key idea is that these
representations for the logs are robust and less invariant to changes in the
logs, and therefore, result in a better generalization of the anomaly detection
models. We perform several experiments on a cloud dataset evaluating different
language models for obtaining numerical log representations such as BERT,
GPT-2, and XL. The robustness is evaluated by gradually altering log messages,
to simulate a change in semantics. Our results show that the proposed approach
achieves high performance and robustness, which opens up possibilities for
future research in this direction.
Related papers
- Anomaly Detection on Unstable Logs with GPT Models [1.9713190626298576]
This paper reports on an experimental comparison of a fine-tuned LLM and alternative models for anomaly detection on unstable logs.
The pre-training of LLMs on vast datasets may enable a robust understanding of diverse patterns and contextual information.
The difference between GPT-3 and other supervised approaches tends to become more significant as the degree of changes in log sequences increases.
arXiv Detail & Related papers (2024-06-11T17:13:18Z) - LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection [73.69399219776315]
We propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains.
Specifically, our model is first pre-trained on the source domain to obtain shared semantic knowledge of log data.
Then, we transfer such knowledge to the target domain via shared parameters.
arXiv Detail & Related papers (2024-01-09T12:55:21Z) - EvLog: Identifying Anomalous Logs over Software Evolution [31.46106509190191]
We propose a novel unsupervised approach named Evolving Log extractor (EvLog) to process logs without parsing.
EvLog implements an anomaly discriminator with an attention mechanism to identify the anomalous logs and avoid the issue brought by the unstable sequence.
EvLog has shown effectiveness in two real-world system evolution log datasets with an average F1 score of 0.955 and 0.847 in the intra-version setting and inter-version setting, respectively.
arXiv Detail & Related papers (2023-06-02T12:58:00Z) - PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows.
Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z) - LAnoBERT: System Log Anomaly Detection based on BERT Masked Language
Model [12.00171674362062]
The aim of system log anomaly detection is to promptly identify anomalies while minimizing human intervention.
Previous studies performed anomaly detection through algorithms after converting various forms of log data into a standardized template.
In this study, we propose LAnoBERT, exhibiting excellent natural language processing performance.
arXiv Detail & Related papers (2021-11-18T07:46:35Z) - LogLAB: Attention-Based Labeling of Log Data Anomalies via Weak
Supervision [63.08516384181491]
We present LogLAB, a novel modeling approach for automated labeling of log messages without requiring manual work by experts.
Our method relies on estimated failure time windows provided by monitoring systems to produce precise labeled datasets in retrospect.
Our evaluation shows that LogLAB consistently outperforms nine benchmark approaches across three different datasets and maintains an F1-score of more than 0.98 even at large failure time windows.
arXiv Detail & Related papers (2021-11-02T15:16:08Z) - Log-based Anomaly Detection Without Log Parsing [7.66638994053231]
We propose NeuralLog, a novel log-based anomaly detection approach that does not require log parsing.
Our experimental results show that the proposed approach can effectively understand the semantic meaning of log messages.
Overall, NeuralLog achieves F1-scores greater than 0.95 on four public datasets, outperforming the existing approaches.
arXiv Detail & Related papers (2021-08-04T10:42:13Z) - TELESTO: A Graph Neural Network Model for Anomaly Classification in
Cloud Services [77.454688257702]
Machine learning (ML) and artificial intelligence (AI) are applied on IT system operation and maintenance.
One direction aims at the recognition of re-occurring anomaly types to enable remediation automation.
We propose a method that is invariant to dimensionality changes of given data.
arXiv Detail & Related papers (2021-02-25T14:24:49Z) - Self-Attentive Classification-Based Anomaly Detection in Unstructured
Logs [59.04636530383049]
We propose Logsy, a classification-based method to learn log representations.
We show an average improvement of 0.25 in the F1 score, compared to the previous methods.
arXiv Detail & Related papers (2020-08-21T07:26:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.