Related papers: AutoLog: A Log Sequence Synthesis Framework for Anomaly Detection

AutoLog: A Log Sequence Synthesis Framework for Anomaly Detection

URL: http://arxiv.org/abs/2308.09324v1
Date: Fri, 18 Aug 2023 05:56:18 GMT
Title: AutoLog: A Log Sequence Synthesis Framework for Anomaly Detection
Authors: Yintong Huo, Yichen Li, Yuxin Su, Pinjia He, Zifan Xie, and Michael R. Lyu
Abstract summary: AutoLog is the first automated log generation methodology for anomaly detection. It generates run-time log sequences without actually running the system. It propagates the anomaly label to each acquired execution path based on human knowledge.
Score: 34.91789047641838
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid progress of modern computing systems has led to a growing interest in informative run-time logs. Various log-based anomaly detection techniques have been proposed to ensure software reliability. However, their implementation in the industry has been limited due to the lack of high-quality public log resources as training datasets. While some log datasets are available for anomaly detection, they suffer from limitations in (1) comprehensiveness of log events; (2) scalability over diverse systems; and (3) flexibility of log utility. To address these limitations, we propose AutoLog, the first automated log generation methodology for anomaly detection. AutoLog uses program analysis to generate run-time log sequences without actually running the system. AutoLog starts with probing comprehensive logging statements associated with the call graphs of an application. Then, it constructs execution graphs for each method after pruning the call graphs to find log-related execution paths in a scalable manner. Finally, AutoLog propagates the anomaly label to each acquired execution path based on human knowledge. It generates flexible log sequences by walking along the log execution paths with controllable parameters. Experiments on 50 popular Java projects show that AutoLog acquires significantly more (9x-58x) log events than existing log datasets from the same system, and generates log messages much faster (15x) with a single machine than existing passive data collection approaches. We hope AutoLog can facilitate the benchmarking and adoption of automated log analysis techniques.

Related papers

Cross-System Software Log-based Anomaly Detection Using Meta-Learning [17.39262430769509]
AIOps tools have been developed to automate the process of log-based anomaly detection for software systems. Three practical challenges are widely recognized in this field: high data labeling costs, evolving logs in dynamic systems, and adaptability across different systems. We propose CroSysLog, an AIOps tool for log-event level anomaly detection, specifically designed in response to these challenges.
arXiv Detail & Related papers (2024-12-19T22:55:45Z)
LogLLM: Log-based Anomaly Detection Using Large Language Models [8.03646578793411]
We propose LogLLM, a log-based anomaly detection framework that leverages large language models (LLMs) LogLLM employs BERT for extracting semantic vectors from log messages, while utilizing Llama, a transformer decoder-based model, for classifying log sequences. Our framework is trained through a novel three-stage procedure designed to enhance performance and adaptability.
arXiv Detail & Related papers (2024-11-13T12:18:00Z)
LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection [73.69399219776315]
We propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains. Specifically, our model is first pre-trained on the source domain to obtain shared semantic knowledge of log data. Then, we transfer such knowledge to the target domain via shared parameters.
arXiv Detail & Related papers (2024-01-09T12:55:21Z)
A Large-Scale Evaluation for Log Parsing Techniques: How Far Are We? [42.56249610409624]
We provide a new collection of annotated log datasets, denoted Loghub-2.0, which can better reflect the characteristics of log data in real-world software systems. We conduct a thorough re-evaluation of 15 state-of-the-art logs in a more rigorous and practical setting. Particularly, we introduce a new evaluation metric to mitigate the sensitivity of existing metrics to imbalanced data distributions.
arXiv Detail & Related papers (2023-08-21T16:24:15Z)
Log Parsing Evaluation in the Era of Modern Software Systems [47.370291246632114]
We focus on one integral part of automated log analysis, log parsing, which is the prerequisite to deriving any insights from logs. Our investigation reveals problematic aspects within the log parsing field, particularly its inefficiency in handling heterogeneous real-world logs. We propose a tool, Logchimera, that enables estimating log parsing performance in industry contexts.
arXiv Detail & Related papers (2023-08-17T14:19:22Z)
LogAI: A Library for Log Analytics and Intelligence [27.889928073709516]
LogAI is a one-stop open source library for log analytics and intelligence. It supports tasks such as log summarization, log clustering and log anomaly detection. LogAI provides a unified model interface and provides popular time-series, statistical learning and deep learning models.
arXiv Detail & Related papers (2023-01-31T05:08:39Z)
LogLAB: Attention-Based Labeling of Log Data Anomalies via Weak Supervision [63.08516384181491]
We present LogLAB, a novel modeling approach for automated labeling of log messages without requiring manual work by experts. Our method relies on estimated failure time windows provided by monitoring systems to produce precise labeled datasets in retrospect. Our evaluation shows that LogLAB consistently outperforms nine benchmark approaches across three different datasets and maintains an F1-score of more than 0.98 even at large failure time windows.
arXiv Detail & Related papers (2021-11-02T15:16:08Z)
Loghub: A Large Collection of System Log Datasets for AI-driven Log Analytics [40.96246300489472]
We have collected and released loghub, a large collection of system log datasets. In particular, loghub provides 19 real-world log datasets collected from a wide range of software systems. Up to the time of this paper writing, the loghub datasets have been downloaded for roughly 90,000 times in total by hundreds of organizations from both industry and academia.
arXiv Detail & Related papers (2020-08-14T16:17:54Z)
Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records. Existing approaches rely on log-specifics or manual rule extraction. We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.