AutoLog: A Log Sequence Synthesis Framework for Anomaly Detection
- URL: http://arxiv.org/abs/2308.09324v1
- Date: Fri, 18 Aug 2023 05:56:18 GMT
- Title: AutoLog: A Log Sequence Synthesis Framework for Anomaly Detection
- Authors: Yintong Huo, Yichen Li, Yuxin Su, Pinjia He, Zifan Xie, and Michael R.
Lyu
- Abstract summary: AutoLog is the first automated log generation methodology for anomaly detection.
It generates run-time log sequences without actually running the system.
It propagates the anomaly label to each acquired execution path based on human knowledge.
- Score: 34.91789047641838
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid progress of modern computing systems has led to a growing interest
in informative run-time logs. Various log-based anomaly detection techniques
have been proposed to ensure software reliability. However, their
implementation in the industry has been limited due to the lack of high-quality
public log resources as training datasets.
While some log datasets are available for anomaly detection, they suffer from
limitations in (1) comprehensiveness of log events; (2) scalability over
diverse systems; and (3) flexibility of log utility. To address these
limitations, we propose AutoLog, the first automated log generation methodology
for anomaly detection. AutoLog uses program analysis to generate run-time log
sequences without actually running the system. AutoLog starts with probing
comprehensive logging statements associated with the call graphs of an
application. Then, it constructs execution graphs for each method after pruning
the call graphs to find log-related execution paths in a scalable manner.
Finally, AutoLog propagates the anomaly label to each acquired execution path
based on human knowledge. It generates flexible log sequences by walking along
the log execution paths with controllable parameters. Experiments on 50 popular
Java projects show that AutoLog acquires significantly more (9x-58x) log events
than existing log datasets from the same system, and generates log messages
much faster (15x) with a single machine than existing passive data collection
approaches. We hope AutoLog can facilitate the benchmarking and adoption of
automated log analysis techniques.
Related papers
- LogLLM: Log-based Anomaly Detection Using Large Language Models [8.03646578793411]
We propose LogLLM, a log-based anomaly detection framework that leverages large language models (LLMs)
LogLLM employs BERT for extracting semantic vectors from log messages, while utilizing Llama, a transformer decoder-based model, for classifying log sequences.
Our framework is trained through a novel three-stage procedure designed to enhance performance and adaptability.
arXiv Detail & Related papers (2024-11-13T12:18:00Z) - LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection [73.69399219776315]
We propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains.
Specifically, our model is first pre-trained on the source domain to obtain shared semantic knowledge of log data.
Then, we transfer such knowledge to the target domain via shared parameters.
arXiv Detail & Related papers (2024-01-09T12:55:21Z) - A Large-Scale Evaluation for Log Parsing Techniques: How Far Are We? [42.56249610409624]
We provide a new collection of annotated log datasets, denoted Loghub-2.0, which can better reflect the characteristics of log data in real-world software systems.
We conduct a thorough re-evaluation of 15 state-of-the-art logs in a more rigorous and practical setting. Particularly, we introduce a new evaluation metric to mitigate the sensitivity of existing metrics to imbalanced data distributions.
arXiv Detail & Related papers (2023-08-21T16:24:15Z) - Log Parsing Evaluation in the Era of Modern Software Systems [47.370291246632114]
We focus on one integral part of automated log analysis, log parsing, which is the prerequisite to deriving any insights from logs.
Our investigation reveals problematic aspects within the log parsing field, particularly its inefficiency in handling heterogeneous real-world logs.
We propose a tool, Logchimera, that enables estimating log parsing performance in industry contexts.
arXiv Detail & Related papers (2023-08-17T14:19:22Z) - LogAI: A Library for Log Analytics and Intelligence [27.889928073709516]
LogAI is a one-stop open source library for log analytics and intelligence.
It supports tasks such as log summarization, log clustering and log anomaly detection.
LogAI provides a unified model interface and provides popular time-series, statistical learning and deep learning models.
arXiv Detail & Related papers (2023-01-31T05:08:39Z) - LogLAB: Attention-Based Labeling of Log Data Anomalies via Weak
Supervision [63.08516384181491]
We present LogLAB, a novel modeling approach for automated labeling of log messages without requiring manual work by experts.
Our method relies on estimated failure time windows provided by monitoring systems to produce precise labeled datasets in retrospect.
Our evaluation shows that LogLAB consistently outperforms nine benchmark approaches across three different datasets and maintains an F1-score of more than 0.98 even at large failure time windows.
arXiv Detail & Related papers (2021-11-02T15:16:08Z) - Loghub: A Large Collection of System Log Datasets for AI-driven Log
Analytics [40.96246300489472]
We have collected and released loghub, a large collection of system log datasets.
In particular, loghub provides 19 real-world log datasets collected from a wide range of software systems.
Up to the time of this paper writing, the loghub datasets have been downloaded for roughly 90,000 times in total by hundreds of organizations from both industry and academia.
arXiv Detail & Related papers (2020-08-14T16:17:54Z) - Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.