Related papers: ClusterLog: Clustering Logs for Effective Log-based Anomaly Detection

ClusterLog: Clustering Logs for Effective Log-based Anomaly Detection

URL: http://arxiv.org/abs/2301.07846v1
Date: Thu, 19 Jan 2023 01:54:48 GMT
Title: ClusterLog: Clustering Logs for Effective Log-based Anomaly Detection
Authors: Chris Egersdoerfer, Dong Dai, Di Zhang
Abstract summary: This study proposes ClusterLog, a log pre-processing method that clusters the temporal sequence of log keys based on their semantic similarity. By grouping semantically and sentimentally similar logs, this approach aims to represent log sequences with the smallest amount of unique log keys, intending to improve the ability of a downstream sequence-based model to effectively learn the log patterns.
Score: 3.3196401064045014
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the increasing prevalence of scalable file systems in the context of High Performance Computing (HPC), the importance of accurate anomaly detection on runtime logs is increasing. But as it currently stands, many state-of-the-art methods for log-based anomaly detection, such as DeepLog, have encountered numerous challenges when applied to logs from many parallel file systems (PFSes), often due to their irregularity and ambiguity in time-based log sequences. To circumvent these problems, this study proposes ClusterLog, a log pre-processing method that clusters the temporal sequence of log keys based on their semantic similarity. By grouping semantically and sentimentally similar logs, this approach aims to represent log sequences with the smallest amount of unique log keys, intending to improve the ability of a downstream sequence-based model to effectively learn the log patterns. The preliminary results of ClusterLog indicate not only its effectiveness in reducing the granularity of log sequences without the loss of important sequence information but also its generalizability to different file systems' logs.

Related papers

LogLSHD: Fast Log Parsing with Locality-Sensitive Hashing and Dynamic Time Warping [2.415288727960745]
Large-scale software systems generate vast volumes of system logs that are essential for monitoring, diagnosing, and performance optimization. LogLSHD demonstrates exceptional efficiency in parsing time, significantly outperforming state-of-the-art methods. For example, compared to Drain, LogLSHD reduces the average parsing time by 73% while increasing the average parsing accuracy by 15% on the LogHub 2.0 benchmark.
arXiv Detail & Related papers (2025-04-02T23:08:04Z)
LogLLM: Log-based Anomaly Detection Using Large Language Models [8.03646578793411]
We propose LogLLM, a log-based anomaly detection framework that leverages large language models (LLMs) LogLLM employs BERT for extracting semantic vectors from log messages, while utilizing Llama, a transformer decoder-based model, for classifying log sequences. Our framework is trained through a novel three-stage procedure designed to enhance performance and adaptability.
arXiv Detail & Related papers (2024-11-13T12:18:00Z)
Stronger, Cheaper and Demonstration-Free Log Parsing with LLMs [18.240096266464544]
We propose LogBatcher, a cost-effective LLM-based log that requires no training process or labeled data. We have conducted experiments on 16 public log datasets and the results show that LogBatcher is effective for log parsing.
arXiv Detail & Related papers (2024-06-10T10:39:28Z)
LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection [73.69399219776315]
We propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains. Specifically, our model is first pre-trained on the source domain to obtain shared semantic knowledge of log data. Then, we transfer such knowledge to the target domain via shared parameters.
arXiv Detail & Related papers (2024-01-09T12:55:21Z)
RAPID: Training-free Retrieval-based Log Anomaly Detection with PLM considering Token-level information [7.861095039299132]
The need for log anomaly detection is growing, especially in real-world applications. Traditional deep learning-based anomaly detection models require dataset-specific training, leading to corresponding delays. We introduce RAPID, a model that capitalizes on the inherent features of log data to enable anomaly detection without training delays.
arXiv Detail & Related papers (2023-11-09T06:11:44Z)
GLAD: Content-aware Dynamic Graphs For Log Anomaly Detection [49.9884374409624]
GLAD is a Graph-based Log Anomaly Detection framework designed to detect anomalies in system logs. We introduce GLAD, a Graph-based Log Anomaly Detection framework designed to detect anomalies in system logs.
arXiv Detail & Related papers (2023-09-12T04:21:30Z)
Log Parsing Evaluation in the Era of Modern Software Systems [47.370291246632114]
We focus on one integral part of automated log analysis, log parsing, which is the prerequisite to deriving any insights from logs. Our investigation reveals problematic aspects within the log parsing field, particularly its inefficiency in handling heterogeneous real-world logs. We propose a tool, Logchimera, that enables estimating log parsing performance in industry contexts.
arXiv Detail & Related papers (2023-08-17T14:19:22Z)
LogGD:Detecting Anomalies from System Logs by Graph Neural Networks [14.813971618949068]
We propose a novel graph-based log anomaly detection method, LogGD, to effectively address the issue. We exploit the powerful capability of Graph Transformer Neural Network, which combines graph structure and node semantics for log-based anomaly detection.
arXiv Detail & Related papers (2022-09-16T11:51:58Z)
LogLG: Weakly Supervised Log Anomaly Detection via Log-Event Graph Construction [31.31712326361932]
We propose a novel weakly supervised log anomaly detection framework, named LogLG, to explore the semantic connections among keywords from sequences. Specifically, we design an end-to-end iterative process, where the keywords of unlabeled logs are first extracted to construct a log-event graph. Then, we build a subgraph annotator to generate pseudo labels for unlabeled log sequences.
arXiv Detail & Related papers (2022-08-23T09:32:19Z)
Robust and Transferable Anomaly Detection in Log Data using Pre-Trained Language Models [59.04636530383049]
Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users. We propose a framework for anomaly detection in log data, as a major troubleshooting source of system information.
arXiv Detail & Related papers (2021-02-23T09:17:05Z)
Self-Attentive Classification-Based Anomaly Detection in Unstructured Logs [59.04636530383049]
We propose Logsy, a classification-based method to learn log representations. We show an average improvement of 0.25 in the F1 score, compared to the previous methods.
arXiv Detail & Related papers (2020-08-21T07:26:55Z)
Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records. Existing approaches rely on log-specifics or manual rule extraction. We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.