LogSD: Detecting Anomalies from System Logs through Self-supervised Learning and Frequency-based Masking
- URL: http://arxiv.org/abs/2404.11294v2
- Date: Fri, 19 Apr 2024 01:18:41 GMT
- Title: LogSD: Detecting Anomalies from System Logs through Self-supervised Learning and Frequency-based Masking
- Authors: Yongzheng Xie, Hongyu Zhang, Muhammad Ali Babar,
- Abstract summary: We propose LogSD, a novel semi-supervised self-supervised learning approach.
We show that LogSD significantly outperforms eight state-of-the-art benchmark methods.
- Score: 14.784236273395017
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Log analysis is one of the main techniques that engineers use for troubleshooting large-scale software systems. Over the years, many supervised, semi-supervised, and unsupervised log analysis methods have been proposed to detect system anomalies by analyzing system logs. Among these, semi-supervised methods have garnered increasing attention as they strike a balance between relaxed labeled data requirements and optimal detection performance, contrasting with their supervised and unsupervised counterparts. However, existing semi-supervised methods overlook the potential bias introduced by highly frequent log messages on the learned normal patterns, which leads to their less than satisfactory performance. In this study, we propose LogSD, a novel semi-supervised self-supervised learning approach. LogSD employs a dual-network architecture and incorporates a frequency-based masking scheme, a global-to-local reconstruction paradigm and three self-supervised learning tasks. These features enable LogSD to focus more on relatively infrequent log messages, thereby effectively learning less biased and more discriminative patterns from historical normal data. This emphasis ultimately leads to improved anomaly detection performance. Extensive experiments have been conducted on three commonly-used datasets and the results show that LogSD significantly outperforms eight state-of-the-art benchmark methods.
Related papers
- Multi-modal Causal Structure Learning and Root Cause Analysis [67.67578590390907]
We propose Mulan, a unified multi-modal causal structure learning method for root cause localization.
We leverage a log-tailored language model to facilitate log representation learning, converting log sequences into time-series data.
We also introduce a novel key performance indicator-aware attention mechanism for assessing modality reliability and co-learning a final causal graph.
arXiv Detail & Related papers (2024-02-04T05:50:38Z) - LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection [73.69399219776315]
We propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains.
Specifically, our model is first pre-trained on the source domain to obtain shared semantic knowledge of log data.
Then, we transfer such knowledge to the target domain via shared parameters.
arXiv Detail & Related papers (2024-01-09T12:55:21Z) - RAPID: Training-free Retrieval-based Log Anomaly Detection with PLM
considering Token-level information [7.861095039299132]
The need for log anomaly detection is growing, especially in real-world applications.
Traditional deep learning-based anomaly detection models require dataset-specific training, leading to corresponding delays.
We introduce RAPID, a model that capitalizes on the inherent features of log data to enable anomaly detection without training delays.
arXiv Detail & Related papers (2023-11-09T06:11:44Z) - Log-based Anomaly Detection based on EVT Theory with feedback [31.949892354842525]
We present an accurate, lightweight, and adaptive log-based anomaly detection framework, referred to as SeaLog.
Our method introduces a Trie-based Detection Agent (TDA) that employs a lightweight, dynamically-growing trie structure for real-time anomaly detection.
To enhance TDA's accuracy in response to evolving log data, we enable it to receive feedback from experts.
arXiv Detail & Related papers (2023-06-08T08:34:58Z) - PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows.
Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z) - LogLAB: Attention-Based Labeling of Log Data Anomalies via Weak
Supervision [63.08516384181491]
We present LogLAB, a novel modeling approach for automated labeling of log messages without requiring manual work by experts.
Our method relies on estimated failure time windows provided by monitoring systems to produce precise labeled datasets in retrospect.
Our evaluation shows that LogLAB consistently outperforms nine benchmark approaches across three different datasets and maintains an F1-score of more than 0.98 even at large failure time windows.
arXiv Detail & Related papers (2021-11-02T15:16:08Z) - Experience Report: Deep Learning-based System Log Analysis for Anomaly
Detection [30.52620190783608]
We provide a review and evaluation on five popular models used by six state-of-the-art anomaly detectors.
Four of the selected methods are unsupervised and the remaining two are supervised.
We believe our work can serve as a basis in this field and contribute to the future academic researches and industrial applications.
arXiv Detail & Related papers (2021-07-13T08:10:47Z) - Self-Attentive Classification-Based Anomaly Detection in Unstructured
Logs [59.04636530383049]
We propose Logsy, a classification-based method to learn log representations.
We show an average improvement of 0.25 in the F1 score, compared to the previous methods.
arXiv Detail & Related papers (2020-08-21T07:26:55Z) - Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.