Loghub: A Large Collection of System Log Datasets for AI-driven Log
Analytics
- URL: http://arxiv.org/abs/2008.06448v3
- Date: Wed, 13 Sep 2023 01:23:14 GMT
- Title: Loghub: A Large Collection of System Log Datasets for AI-driven Log
Analytics
- Authors: Jieming Zhu, Shilin He, Pinjia He, Jinyang Liu, and Michael R. Lyu
- Abstract summary: We have collected and released loghub, a large collection of system log datasets.
In particular, loghub provides 19 real-world log datasets collected from a wide range of software systems.
Up to the time of this paper writing, the loghub datasets have been downloaded for roughly 90,000 times in total by hundreds of organizations from both industry and academia.
- Score: 40.96246300489472
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Logs have been widely adopted in software system development and maintenance
because of the rich runtime information they record. In recent years, the
increase of software size and complexity leads to the rapid growth of the
volume of logs. To handle these large volumes of logs efficiently and
effectively, a line of research focuses on developing intelligent and automated
log analysis techniques. However, only a few of these techniques have reached
successful deployments in industry due to the lack of public log datasets and
open benchmarking upon them. To fill this significant gap and facilitate more
research on AI-driven log analytics, we have collected and released loghub, a
large collection of system log datasets. In particular, loghub provides 19
real-world log datasets collected from a wide range of software systems,
including distributed systems, supercomputers, operating systems, mobile
systems, server applications, and standalone software. In this paper, we
summarize the statistics of these datasets, introduce some practical usage
scenarios of the loghub datasets, and present our benchmarking results on
loghub to benefit the researchers and practitioners in this field. Up to the
time of this paper writing, the loghub datasets have been downloaded for
roughly 90,000 times in total by hundreds of organizations from both industry
and academia. The loghub datasets are available at
https://github.com/logpai/loghub.
Related papers
- GLAD: Content-aware Dynamic Graphs For Log Anomaly Detection [49.9884374409624]
GLAD is a Graph-based Log Anomaly Detection framework designed to detect anomalies in system logs.
We introduce GLAD, a Graph-based Log Anomaly Detection framework designed to detect anomalies in system logs.
arXiv Detail & Related papers (2023-09-12T04:21:30Z) - A Large-Scale Evaluation for Log Parsing Techniques: How Far Are We? [42.56249610409624]
We provide a new collection of annotated log datasets, denoted Loghub-2.0, which can better reflect the characteristics of log data in real-world software systems.
We conduct a thorough re-evaluation of 15 state-of-the-art logs in a more rigorous and practical setting. Particularly, we introduce a new evaluation metric to mitigate the sensitivity of existing metrics to imbalanced data distributions.
arXiv Detail & Related papers (2023-08-21T16:24:15Z) - AutoLog: A Log Sequence Synthesis Framework for Anomaly Detection [34.91789047641838]
AutoLog is the first automated log generation methodology for anomaly detection.
It generates run-time log sequences without actually running the system.
It propagates the anomaly label to each acquired execution path based on human knowledge.
arXiv Detail & Related papers (2023-08-18T05:56:18Z) - Log Parsing Evaluation in the Era of Modern Software Systems [47.370291246632114]
We focus on one integral part of automated log analysis, log parsing, which is the prerequisite to deriving any insights from logs.
Our investigation reveals problematic aspects within the log parsing field, particularly its inefficiency in handling heterogeneous real-world logs.
We propose a tool, Logchimera, that enables estimating log parsing performance in industry contexts.
arXiv Detail & Related papers (2023-08-17T14:19:22Z) - On the Effectiveness of Log Representation for Log-based Anomaly Detection [12.980238412281471]
This work investigates and compares the commonly adopted log representation techniques from previous log analysis research.
We select six log representation techniques and evaluate them with seven ML models and four public log datasets.
We also examine the impacts of the log parsing process and the different feature aggregation approaches when they are employed with log representation techniques.
arXiv Detail & Related papers (2023-08-17T02:18:59Z) - LogAI: A Library for Log Analytics and Intelligence [27.889928073709516]
LogAI is a one-stop open source library for log analytics and intelligence.
It supports tasks such as log summarization, log clustering and log anomaly detection.
LogAI provides a unified model interface and provides popular time-series, statistical learning and deep learning models.
arXiv Detail & Related papers (2023-01-31T05:08:39Z) - LogLAB: Attention-Based Labeling of Log Data Anomalies via Weak
Supervision [63.08516384181491]
We present LogLAB, a novel modeling approach for automated labeling of log messages without requiring manual work by experts.
Our method relies on estimated failure time windows provided by monitoring systems to produce precise labeled datasets in retrospect.
Our evaluation shows that LogLAB consistently outperforms nine benchmark approaches across three different datasets and maintains an F1-score of more than 0.98 even at large failure time windows.
arXiv Detail & Related papers (2021-11-02T15:16:08Z) - Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.