ClusterLog: Clustering Logs for Effective Log-based Anomaly Detection
- URL: http://arxiv.org/abs/2301.07846v1
- Date: Thu, 19 Jan 2023 01:54:48 GMT
- Title: ClusterLog: Clustering Logs for Effective Log-based Anomaly Detection
- Authors: Chris Egersdoerfer, Dong Dai, Di Zhang
- Abstract summary: This study proposes ClusterLog, a log pre-processing method that clusters the temporal sequence of log keys based on their semantic similarity.
By grouping semantically and sentimentally similar logs, this approach aims to represent log sequences with the smallest amount of unique log keys, intending to improve the ability of a downstream sequence-based model to effectively learn the log patterns.
- Score: 3.3196401064045014
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the increasing prevalence of scalable file systems in the context of
High Performance Computing (HPC), the importance of accurate anomaly detection
on runtime logs is increasing. But as it currently stands, many
state-of-the-art methods for log-based anomaly detection, such as DeepLog, have
encountered numerous challenges when applied to logs from many parallel file
systems (PFSes), often due to their irregularity and ambiguity in time-based
log sequences. To circumvent these problems, this study proposes ClusterLog, a
log pre-processing method that clusters the temporal sequence of log keys based
on their semantic similarity. By grouping semantically and sentimentally
similar logs, this approach aims to represent log sequences with the smallest
amount of unique log keys, intending to improve the ability of a downstream
sequence-based model to effectively learn the log patterns. The preliminary
results of ClusterLog indicate not only its effectiveness in reducing the
granularity of log sequences without the loss of important sequence information
but also its generalizability to different file systems' logs.
Related papers
- Stronger, Cheaper and Demonstration-Free Log Parsing with LLMs [18.240096266464544]
We propose LogBatcher, a cost-effective LLM-based log that requires no training process or labeled data.
We have conducted experiments on 16 public log datasets and the results show that LogBatcher is effective for log parsing.
arXiv Detail & Related papers (2024-06-10T10:39:28Z) - LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection [73.69399219776315]
We propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains.
Specifically, our model is first pre-trained on the source domain to obtain shared semantic knowledge of log data.
Then, we transfer such knowledge to the target domain via shared parameters.
arXiv Detail & Related papers (2024-01-09T12:55:21Z) - RAPID: Training-free Retrieval-based Log Anomaly Detection with PLM
considering Token-level information [7.861095039299132]
The need for log anomaly detection is growing, especially in real-world applications.
Traditional deep learning-based anomaly detection models require dataset-specific training, leading to corresponding delays.
We introduce RAPID, a model that capitalizes on the inherent features of log data to enable anomaly detection without training delays.
arXiv Detail & Related papers (2023-11-09T06:11:44Z) - GLAD: Content-aware Dynamic Graphs For Log Anomaly Detection [49.9884374409624]
GLAD is a Graph-based Log Anomaly Detection framework designed to detect anomalies in system logs.
We introduce GLAD, a Graph-based Log Anomaly Detection framework designed to detect anomalies in system logs.
arXiv Detail & Related papers (2023-09-12T04:21:30Z) - Log Parsing Evaluation in the Era of Modern Software Systems [47.370291246632114]
We focus on one integral part of automated log analysis, log parsing, which is the prerequisite to deriving any insights from logs.
Our investigation reveals problematic aspects within the log parsing field, particularly its inefficiency in handling heterogeneous real-world logs.
We propose a tool, Logchimera, that enables estimating log parsing performance in industry contexts.
arXiv Detail & Related papers (2023-08-17T14:19:22Z) - LogGD:Detecting Anomalies from System Logs by Graph Neural Networks [14.813971618949068]
We propose a novel graph-based log anomaly detection method, LogGD, to effectively address the issue.
We exploit the powerful capability of Graph Transformer Neural Network, which combines graph structure and node semantics for log-based anomaly detection.
arXiv Detail & Related papers (2022-09-16T11:51:58Z) - LogLG: Weakly Supervised Log Anomaly Detection via Log-Event Graph
Construction [31.31712326361932]
We propose a novel weakly supervised log anomaly detection framework, named LogLG, to explore the semantic connections among keywords from sequences.
Specifically, we design an end-to-end iterative process, where the keywords of unlabeled logs are first extracted to construct a log-event graph.
Then, we build a subgraph annotator to generate pseudo labels for unlabeled log sequences.
arXiv Detail & Related papers (2022-08-23T09:32:19Z) - Log-based Anomaly Detection Without Log Parsing [7.66638994053231]
We propose NeuralLog, a novel log-based anomaly detection approach that does not require log parsing.
Our experimental results show that the proposed approach can effectively understand the semantic meaning of log messages.
Overall, NeuralLog achieves F1-scores greater than 0.95 on four public datasets, outperforming the existing approaches.
arXiv Detail & Related papers (2021-08-04T10:42:13Z) - Robust and Transferable Anomaly Detection in Log Data using Pre-Trained
Language Models [59.04636530383049]
Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users.
We propose a framework for anomaly detection in log data, as a major troubleshooting source of system information.
arXiv Detail & Related papers (2021-02-23T09:17:05Z) - Self-Attentive Classification-Based Anomaly Detection in Unstructured
Logs [59.04636530383049]
We propose Logsy, a classification-based method to learn log representations.
We show an average improvement of 0.25 in the F1 score, compared to the previous methods.
arXiv Detail & Related papers (2020-08-21T07:26:55Z) - Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.