Log Parsing Evaluation in the Era of Modern Software Systems
- URL: http://arxiv.org/abs/2308.09003v1
- Date: Thu, 17 Aug 2023 14:19:22 GMT
- Title: Log Parsing Evaluation in the Era of Modern Software Systems
- Authors: Stefan Petrescu, Floris den Hengst, Alexandru Uta, Jan S. Rellermeyer
- Abstract summary: We focus on one integral part of automated log analysis, log parsing, which is the prerequisite to deriving any insights from logs.
Our investigation reveals problematic aspects within the log parsing field, particularly its inefficiency in handling heterogeneous real-world logs.
We propose a tool, Logchimera, that enables estimating log parsing performance in industry contexts.
- Score: 47.370291246632114
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the complexity and size of modern software systems, the amount of logs
generated is tremendous. Hence, it is infeasible to manually investigate these
data in a reasonable time, thereby requiring automating log analysis to derive
insights about the functioning of the systems. Motivated by an industry
use-case, we zoom-in on one integral part of automated log analysis, log
parsing, which is the prerequisite to deriving any insights from logs. Our
investigation reveals problematic aspects within the log parsing field,
particularly its inefficiency in handling heterogeneous real-world logs. We
show this by assessing the 14 most-recognized log parsing approaches in the
literature using (i) nine publicly available datasets, (ii) one dataset
comprised of combined publicly available data, and (iii) one dataset generated
within the infrastructure of a large bank. Subsequently, toward improving log
parsing robustness in real-world production scenarios, we propose a tool,
Logchimera, that enables estimating log parsing performance in industry
contexts through generating synthetic log data that resemble industry logs. Our
contributions serve as a foundation to consolidate past research efforts,
facilitate future research advancements, and establish a strong link between
research and industry log parsing.
Related papers
- LogParser-LLM: Advancing Efficient Log Parsing with Large Language Models [19.657278472819588]
We introduce Log-LLM, a novel log integrated with LLM capabilities.
We address the intricate challenge of parsing granularity, proposing a new metric to allow users to calibrate granularity to their specific needs.
Our method's efficacy is empirically demonstrated through evaluations on the Loghub-2k and the large-scale LogPub benchmark.
arXiv Detail & Related papers (2024-08-25T05:34:24Z) - HELP: Hierarchical Embeddings-based Log Parsing [0.25112747242081457]
Logs are a first-hand source of information for software maintenance and failure diagnosis.
Log parsing is a prerequisite for automated log analysis tasks such as anomaly detection, troubleshooting, and root cause analysis.
Existing online parsing algorithms are susceptible to log drift, where slight log changes create false positives that drown out real anomalies.
arXiv Detail & Related papers (2024-08-15T17:54:31Z) - LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection [73.69399219776315]
We propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains.
Specifically, our model is first pre-trained on the source domain to obtain shared semantic knowledge of log data.
Then, we transfer such knowledge to the target domain via shared parameters.
arXiv Detail & Related papers (2024-01-09T12:55:21Z) - GLAD: Content-aware Dynamic Graphs For Log Anomaly Detection [49.9884374409624]
GLAD is a Graph-based Log Anomaly Detection framework designed to detect anomalies in system logs.
We introduce GLAD, a Graph-based Log Anomaly Detection framework designed to detect anomalies in system logs.
arXiv Detail & Related papers (2023-09-12T04:21:30Z) - A Large-Scale Evaluation for Log Parsing Techniques: How Far Are We? [42.56249610409624]
We provide a new collection of annotated log datasets, denoted Loghub-2.0, which can better reflect the characteristics of log data in real-world software systems.
We conduct a thorough re-evaluation of 15 state-of-the-art logs in a more rigorous and practical setting. Particularly, we introduce a new evaluation metric to mitigate the sensitivity of existing metrics to imbalanced data distributions.
arXiv Detail & Related papers (2023-08-21T16:24:15Z) - On the Effectiveness of Log Representation for Log-based Anomaly Detection [12.980238412281471]
This work investigates and compares the commonly adopted log representation techniques from previous log analysis research.
We select six log representation techniques and evaluate them with seven ML models and four public log datasets.
We also examine the impacts of the log parsing process and the different feature aggregation approaches when they are employed with log representation techniques.
arXiv Detail & Related papers (2023-08-17T02:18:59Z) - LogLAB: Attention-Based Labeling of Log Data Anomalies via Weak
Supervision [63.08516384181491]
We present LogLAB, a novel modeling approach for automated labeling of log messages without requiring manual work by experts.
Our method relies on estimated failure time windows provided by monitoring systems to produce precise labeled datasets in retrospect.
Our evaluation shows that LogLAB consistently outperforms nine benchmark approaches across three different datasets and maintains an F1-score of more than 0.98 even at large failure time windows.
arXiv Detail & Related papers (2021-11-02T15:16:08Z) - Loghub: A Large Collection of System Log Datasets for AI-driven Log
Analytics [40.96246300489472]
We have collected and released loghub, a large collection of system log datasets.
In particular, loghub provides 19 real-world log datasets collected from a wide range of software systems.
Up to the time of this paper writing, the loghub datasets have been downloaded for roughly 90,000 times in total by hundreds of organizations from both industry and academia.
arXiv Detail & Related papers (2020-08-14T16:17:54Z) - Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.