Anomaly Detection on Unstable Logs with GPT Models
- URL: http://arxiv.org/abs/2406.07467v1
- Date: Tue, 11 Jun 2024 17:13:18 GMT
- Title: Anomaly Detection on Unstable Logs with GPT Models
- Authors: Fatemeh Hadadi, Qinghua Xu, Domenico Bianculli, Lionel Briand,
- Abstract summary: This paper reports on an experimental comparison of a fine-tuned LLM and alternative models for anomaly detection on unstable logs.
The pre-training of LLMs on vast datasets may enable a robust understanding of diverse patterns and contextual information.
The difference between GPT-3 and other supervised approaches tends to become more significant as the degree of changes in log sequences increases.
- Score: 1.9713190626298576
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Log-based anomaly detection has been widely studied in the literature as a way to increase the dependability of software-intensive systems. In reality, logs can be unstable due to changes made to the software during its evolution. This, in turn, degrades the performance of downstream log analysis activities, such as anomaly detection. The critical challenge in detecting anomalies on these unstable logs is the lack of information about the new logs, due to insufficient log data from new software versions. The application of Large Language Models (LLMs) to many software engineering tasks has revolutionized various domains. In this paper, we report on an experimental comparison of a fine-tuned LLM and alternative models for anomaly detection on unstable logs. The main motivation is that the pre-training of LLMs on vast datasets may enable a robust understanding of diverse patterns and contextual information, which can be leveraged to mitigate the data insufficiency issue in the context of software evolution. Our experimental results on the two-version dataset of LOGEVOL-Hadoop show that the fine-tuned LLM (GPT-3) fares slightly better than supervised baselines when evaluated on unstable logs. The difference between GPT-3 and other supervised approaches tends to become more significant as the degree of changes in log sequences increases. However, it is unclear whether the difference is practically significant in all cases. Lastly, our comparison of prompt engineering (with GPT-4) and fine-tuning reveals that the latter provides significantly superior performance on both stable and unstable logs, offering valuable insights into the effective utilization of LLMs in this domain.
Related papers
- Semi-supervised learning via DQN for log anomaly detection [1.5339370927841764]
Current methods in log anomaly detection face challenges such as underutilization of unlabeled data, imbalance between normal and anomaly class data, and high rates of false positives and false negatives.
We propose a semi-supervised log anomaly detection method named DQNLog, which integrates deep reinforcement learning to enhance anomaly detection performance.
We evaluate DQNLog on three widely used datasets, demonstrating its ability to effectively utilize large-scale unlabeled data.
arXiv Detail & Related papers (2024-01-06T08:04:13Z) - LogGPT: Log Anomaly Detection via GPT [15.790373280124196]
We propose LogGPT, a novel framework that employs GPT for log anomaly detection.
LogGPT is first trained to predict the next log entry based on the preceding sequence.
We propose a novel reinforcement learning strategy to finetune the model specifically for the log anomaly detection task.
arXiv Detail & Related papers (2023-09-25T19:29:50Z) - Log-based Anomaly Detection based on EVT Theory with feedback [31.949892354842525]
We present an accurate, lightweight, and adaptive log-based anomaly detection framework, referred to as SeaLog.
Our method introduces a Trie-based Detection Agent (TDA) that employs a lightweight, dynamically-growing trie structure for real-time anomaly detection.
To enhance TDA's accuracy in response to evolving log data, we enable it to receive feedback from experts.
arXiv Detail & Related papers (2023-06-08T08:34:58Z) - EvLog: Identifying Anomalous Logs over Software Evolution [31.46106509190191]
We propose a novel unsupervised approach named Evolving Log extractor (EvLog) to process logs without parsing.
EvLog implements an anomaly discriminator with an attention mechanism to identify the anomalous logs and avoid the issue brought by the unstable sequence.
EvLog has shown effectiveness in two real-world system evolution log datasets with an average F1 score of 0.955 and 0.847 in the intra-version setting and inter-version setting, respectively.
arXiv Detail & Related papers (2023-06-02T12:58:00Z) - PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows.
Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z) - LogLAB: Attention-Based Labeling of Log Data Anomalies via Weak
Supervision [63.08516384181491]
We present LogLAB, a novel modeling approach for automated labeling of log messages without requiring manual work by experts.
Our method relies on estimated failure time windows provided by monitoring systems to produce precise labeled datasets in retrospect.
Our evaluation shows that LogLAB consistently outperforms nine benchmark approaches across three different datasets and maintains an F1-score of more than 0.98 even at large failure time windows.
arXiv Detail & Related papers (2021-11-02T15:16:08Z) - DAE : Discriminatory Auto-Encoder for multivariate time-series anomaly
detection in air transportation [68.8204255655161]
We propose a novel anomaly detection model called Discriminatory Auto-Encoder (DAE)
It uses the baseline of a regular LSTM-based auto-encoder but with several decoders, each getting data of a specific flight phase.
Results show that the DAE achieves better results in both accuracy and speed of detection.
arXiv Detail & Related papers (2021-09-08T14:07:55Z) - Robust and Transferable Anomaly Detection in Log Data using Pre-Trained
Language Models [59.04636530383049]
Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users.
We propose a framework for anomaly detection in log data, as a major troubleshooting source of system information.
arXiv Detail & Related papers (2021-02-23T09:17:05Z) - Self-Attentive Classification-Based Anomaly Detection in Unstructured
Logs [59.04636530383049]
We propose Logsy, a classification-based method to learn log representations.
We show an average improvement of 0.25 in the F1 score, compared to the previous methods.
arXiv Detail & Related papers (2020-08-21T07:26:55Z) - Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.