RAPID: Training-free Retrieval-based Log Anomaly Detection with PLM
considering Token-level information
- URL: http://arxiv.org/abs/2311.05160v1
- Date: Thu, 9 Nov 2023 06:11:44 GMT
- Title: RAPID: Training-free Retrieval-based Log Anomaly Detection with PLM
considering Token-level information
- Authors: Gunho No, Yukyung Lee, Hyeongwon Kang, Pilsung Kang
- Abstract summary: The need for log anomaly detection is growing, especially in real-world applications.
Traditional deep learning-based anomaly detection models require dataset-specific training, leading to corresponding delays.
We introduce RAPID, a model that capitalizes on the inherent features of log data to enable anomaly detection without training delays.
- Score: 7.861095039299132
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As the IT industry advances, system log data becomes increasingly crucial.
Many computer systems rely on log texts for management due to restricted access
to source code. The need for log anomaly detection is growing, especially in
real-world applications, but identifying anomalies in rapidly accumulating logs
remains a challenging task. Traditional deep learning-based anomaly detection
models require dataset-specific training, leading to corresponding delays.
Notably, most methods only focus on sequence-level log information, which makes
the detection of subtle anomalies harder, and often involve inference processes
that are difficult to utilize in real-time. We introduce RAPID, a model that
capitalizes on the inherent features of log data to enable anomaly detection
without training delays, ensuring real-time capability. RAPID treats logs as
natural language, extracting representations using pre-trained language models.
Given that logs can be categorized based on system context, we implement a
retrieval-based technique to contrast test logs with the most similar normal
logs. This strategy not only obviates the need for log-specific training but
also adeptly incorporates token-level information, ensuring refined and robust
detection, particularly for unseen logs. We also propose the core set
technique, which can reduce the computational cost needed for comparison.
Experimental results show that even without training on log data, RAPID
demonstrates competitive performance compared to prior models and achieves the
best performance on certain datasets. Through various research questions, we
verified its capability for real-time detection without delay.
Related papers
- LogLLM: Log-based Anomaly Detection Using Large Language Models [8.03646578793411]
We propose LogLLM, a log-based anomaly detection framework that leverages large language models (LLMs)
LogLLM employs BERT for extracting semantic vectors from log messages, while utilizing Llama, a transformer decoder-based model, for classifying log sequences.
Our framework is trained through a novel three-stage procedure designed to enhance performance and adaptability.
arXiv Detail & Related papers (2024-11-13T12:18:00Z) - FastLogAD: Log Anomaly Detection with Mask-Guided Pseudo Anomaly Generation and Discrimination [13.458633961243498]
We propose FastLogAD, a generator-discriminator framework trained to generate pseudo-abnormal logs.
During the discriminative stage, FastLogAD learns a distinct separation between normal and pseudoabnormal samples.
Compared to previous methods, FastLogAD achieves at least x10 speed increase in anomaly detection.
arXiv Detail & Related papers (2024-04-12T18:23:29Z) - LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection [73.69399219776315]
We propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains.
Specifically, our model is first pre-trained on the source domain to obtain shared semantic knowledge of log data.
Then, we transfer such knowledge to the target domain via shared parameters.
arXiv Detail & Related papers (2024-01-09T12:55:21Z) - PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows.
Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z) - LAnoBERT: System Log Anomaly Detection based on BERT Masked Language
Model [12.00171674362062]
The aim of system log anomaly detection is to promptly identify anomalies while minimizing human intervention.
Previous studies performed anomaly detection through algorithms after converting various forms of log data into a standardized template.
In this study, we propose LAnoBERT, exhibiting excellent natural language processing performance.
arXiv Detail & Related papers (2021-11-18T07:46:35Z) - LogLAB: Attention-Based Labeling of Log Data Anomalies via Weak
Supervision [63.08516384181491]
We present LogLAB, a novel modeling approach for automated labeling of log messages without requiring manual work by experts.
Our method relies on estimated failure time windows provided by monitoring systems to produce precise labeled datasets in retrospect.
Our evaluation shows that LogLAB consistently outperforms nine benchmark approaches across three different datasets and maintains an F1-score of more than 0.98 even at large failure time windows.
arXiv Detail & Related papers (2021-11-02T15:16:08Z) - Robust and Transferable Anomaly Detection in Log Data using Pre-Trained
Language Models [59.04636530383049]
Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users.
We propose a framework for anomaly detection in log data, as a major troubleshooting source of system information.
arXiv Detail & Related papers (2021-02-23T09:17:05Z) - Self-Attentive Classification-Based Anomaly Detection in Unstructured
Logs [59.04636530383049]
We propose Logsy, a classification-based method to learn log representations.
We show an average improvement of 0.25 in the F1 score, compared to the previous methods.
arXiv Detail & Related papers (2020-08-21T07:26:55Z) - Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.