Log Parsing Evaluation in the Era of Modern Software Systems
- URL: http://arxiv.org/abs/2308.09003v1
- Date: Thu, 17 Aug 2023 14:19:22 GMT
- Title: Log Parsing Evaluation in the Era of Modern Software Systems
- Authors: Stefan Petrescu, Floris den Hengst, Alexandru Uta, Jan S. Rellermeyer
- Abstract summary: We focus on one integral part of automated log analysis, log parsing, which is the prerequisite to deriving any insights from logs.
Our investigation reveals problematic aspects within the log parsing field, particularly its inefficiency in handling heterogeneous real-world logs.
We propose a tool, Logchimera, that enables estimating log parsing performance in industry contexts.
- Score: 47.370291246632114
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the complexity and size of modern software systems, the amount of logs
generated is tremendous. Hence, it is infeasible to manually investigate these
data in a reasonable time, thereby requiring automating log analysis to derive
insights about the functioning of the systems. Motivated by an industry
use-case, we zoom-in on one integral part of automated log analysis, log
parsing, which is the prerequisite to deriving any insights from logs. Our
investigation reveals problematic aspects within the log parsing field,
particularly its inefficiency in handling heterogeneous real-world logs. We
show this by assessing the 14 most-recognized log parsing approaches in the
literature using (i) nine publicly available datasets, (ii) one dataset
comprised of combined publicly available data, and (iii) one dataset generated
within the infrastructure of a large bank. Subsequently, toward improving log
parsing robustness in real-world production scenarios, we propose a tool,
Logchimera, that enables estimating log parsing performance in industry
contexts through generating synthetic log data that resemble industry logs. Our
contributions serve as a foundation to consolidate past research efforts,
facilitate future research advancements, and establish a strong link between
research and industry log parsing.
Related papers
- ULog: Unsupervised Log Parsing with Large Language Models through Log Contrastive Units [34.344687402936835]
We propose ULog, an unsupervised-based method for efficient and off-the-shelf log parsing.
We refer to such groups of logs as Log Contrastive Units (LCUs)
ULog crafts a novel parsing prompt for LLMs to identify contrastive patterns and extract meaningful log structures from LCUs.
arXiv Detail & Related papers (2024-06-11T11:32:01Z) - Stronger, Cheaper and Demonstration-Free Log Parsing with LLMs [18.240096266464544]
We propose LogBatcher, a cost-effective LLM-based log that requires no training process or labeled data.
We have conducted experiments on 16 public log datasets and the results show that LogBatcher is effective for log parsing.
arXiv Detail & Related papers (2024-06-10T10:39:28Z) - LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection [73.69399219776315]
We propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains.
Specifically, our model is first pre-trained on the source domain to obtain shared semantic knowledge of log data.
Then, we transfer such knowledge to the target domain via shared parameters.
arXiv Detail & Related papers (2024-01-09T12:55:21Z) - GLAD: Content-aware Dynamic Graphs For Log Anomaly Detection [49.9884374409624]
GLAD is a Graph-based Log Anomaly Detection framework designed to detect anomalies in system logs.
We introduce GLAD, a Graph-based Log Anomaly Detection framework designed to detect anomalies in system logs.
arXiv Detail & Related papers (2023-09-12T04:21:30Z) - A Large-Scale Evaluation for Log Parsing Techniques: How Far Are We? [42.56249610409624]
We provide a new collection of annotated log datasets, denoted Loghub-2.0, which can better reflect the characteristics of log data in real-world software systems.
We conduct a thorough re-evaluation of 15 state-of-the-art logs in a more rigorous and practical setting. Particularly, we introduce a new evaluation metric to mitigate the sensitivity of existing metrics to imbalanced data distributions.
arXiv Detail & Related papers (2023-08-21T16:24:15Z) - On the Effectiveness of Log Representation for Log-based Anomaly Detection [12.980238412281471]
This work investigates and compares the commonly adopted log representation techniques from previous log analysis research.
We select six log representation techniques and evaluate them with seven ML models and four public log datasets.
We also examine the impacts of the log parsing process and the different feature aggregation approaches when they are employed with log representation techniques.
arXiv Detail & Related papers (2023-08-17T02:18:59Z) - LogLAB: Attention-Based Labeling of Log Data Anomalies via Weak
Supervision [63.08516384181491]
We present LogLAB, a novel modeling approach for automated labeling of log messages without requiring manual work by experts.
Our method relies on estimated failure time windows provided by monitoring systems to produce precise labeled datasets in retrospect.
Our evaluation shows that LogLAB consistently outperforms nine benchmark approaches across three different datasets and maintains an F1-score of more than 0.98 even at large failure time windows.
arXiv Detail & Related papers (2021-11-02T15:16:08Z) - Loghub: A Large Collection of System Log Datasets for AI-driven Log
Analytics [40.96246300489472]
We have collected and released loghub, a large collection of system log datasets.
In particular, loghub provides 19 real-world log datasets collected from a wide range of software systems.
Up to the time of this paper writing, the loghub datasets have been downloaded for roughly 90,000 times in total by hundreds of organizations from both industry and academia.
arXiv Detail & Related papers (2020-08-14T16:17:54Z) - Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.