Related papers: Log Parsing Evaluation in the Era of Modern Software Systems

Log Parsing Evaluation in the Era of Modern Software Systems

URL: http://arxiv.org/abs/2308.09003v1
Date: Thu, 17 Aug 2023 14:19:22 GMT
Title: Log Parsing Evaluation in the Era of Modern Software Systems
Authors: Stefan Petrescu, Floris den Hengst, Alexandru Uta, Jan S. Rellermeyer
Abstract summary: We focus on one integral part of automated log analysis, log parsing, which is the prerequisite to deriving any insights from logs. Our investigation reveals problematic aspects within the log parsing field, particularly its inefficiency in handling heterogeneous real-world logs. We propose a tool, Logchimera, that enables estimating log parsing performance in industry contexts.
Score: 47.370291246632114
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Due to the complexity and size of modern software systems, the amount of logs generated is tremendous. Hence, it is infeasible to manually investigate these data in a reasonable time, thereby requiring automating log analysis to derive insights about the functioning of the systems. Motivated by an industry use-case, we zoom-in on one integral part of automated log analysis, log parsing, which is the prerequisite to deriving any insights from logs. Our investigation reveals problematic aspects within the log parsing field, particularly its inefficiency in handling heterogeneous real-world logs. We show this by assessing the 14 most-recognized log parsing approaches in the literature using (i) nine publicly available datasets, (ii) one dataset comprised of combined publicly available data, and (iii) one dataset generated within the infrastructure of a large bank. Subsequently, toward improving log parsing robustness in real-world production scenarios, we propose a tool, Logchimera, that enables estimating log parsing performance in industry contexts through generating synthetic log data that resemble industry logs. Our contributions serve as a foundation to consolidate past research efforts, facilitate future research advancements, and establish a strong link between research and industry log parsing.

Related papers

SoK: LLM-based Log Parsing [2.2779174914142346]
This paper systematically reviews 29 large language models (LLMs)-based log parsing methods. We analyze the learning and prompt-engineering paradigms employed, efficiency- and effectiveness-enhancing techniques, and the role of LLMs in the parsing process.
arXiv Detail & Related papers (2025-04-07T09:41:04Z)
LogBabylon: A Unified Framework for Cross-Log File Integration and Analysis [6.185113951720912]
LogBabylon is a central log data consolidating solution that leverages Large Language Models (LLMs) integrated with Retrieval-Augmented Generation (RAG) technology. LogBabylon consolidates diverse log sources and enhances the extracted information's accuracy and relevancy. Its capabilities extend to generating context-aware insights, offering an invaluable tool for continuous monitoring, performance optimization, and security assurance.
arXiv Detail & Related papers (2024-12-16T21:36:03Z)
LogParser-LLM: Advancing Efficient Log Parsing with Large Language Models [19.657278472819588]
We introduce Log-LLM, a novel log integrated with LLM capabilities. We address the intricate challenge of parsing granularity, proposing a new metric to allow users to calibrate granularity to their specific needs. Our method's efficacy is empirically demonstrated through evaluations on the Loghub-2k and the large-scale LogPub benchmark.
arXiv Detail & Related papers (2024-08-25T05:34:24Z)
HELP: Hierarchical Embeddings-based Log Parsing [0.25112747242081457]
Logs are a first-hand source of information for software maintenance and failure diagnosis. Log parsing is a prerequisite for automated log analysis tasks such as anomaly detection, troubleshooting, and root cause analysis. Existing online parsing algorithms are susceptible to log drift, where slight log changes create false positives that drown out real anomalies.
arXiv Detail & Related papers (2024-08-15T17:54:31Z)
LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection [73.69399219776315]
We propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains. Specifically, our model is first pre-trained on the source domain to obtain shared semantic knowledge of log data. Then, we transfer such knowledge to the target domain via shared parameters.
arXiv Detail & Related papers (2024-01-09T12:55:21Z)
GLAD: Content-aware Dynamic Graphs For Log Anomaly Detection [49.9884374409624]
GLAD is a Graph-based Log Anomaly Detection framework designed to detect anomalies in system logs. We introduce GLAD, a Graph-based Log Anomaly Detection framework designed to detect anomalies in system logs.
arXiv Detail & Related papers (2023-09-12T04:21:30Z)
A Large-Scale Evaluation for Log Parsing Techniques: How Far Are We? [42.56249610409624]
We provide a new collection of annotated log datasets, denoted Loghub-2.0, which can better reflect the characteristics of log data in real-world software systems. We conduct a thorough re-evaluation of 15 state-of-the-art logs in a more rigorous and practical setting. Particularly, we introduce a new evaluation metric to mitigate the sensitivity of existing metrics to imbalanced data distributions.
arXiv Detail & Related papers (2023-08-21T16:24:15Z)
On the Effectiveness of Log Representation for Log-based Anomaly Detection [12.980238412281471]
This work investigates and compares the commonly adopted log representation techniques from previous log analysis research. We select six log representation techniques and evaluate them with seven ML models and four public log datasets. We also examine the impacts of the log parsing process and the different feature aggregation approaches when they are employed with log representation techniques.
arXiv Detail & Related papers (2023-08-17T02:18:59Z)
LogLAB: Attention-Based Labeling of Log Data Anomalies via Weak Supervision [63.08516384181491]
We present LogLAB, a novel modeling approach for automated labeling of log messages without requiring manual work by experts. Our method relies on estimated failure time windows provided by monitoring systems to produce precise labeled datasets in retrospect. Our evaluation shows that LogLAB consistently outperforms nine benchmark approaches across three different datasets and maintains an F1-score of more than 0.98 even at large failure time windows.
arXiv Detail & Related papers (2021-11-02T15:16:08Z)
Loghub: A Large Collection of System Log Datasets for AI-driven Log Analytics [40.96246300489472]
We have collected and released loghub, a large collection of system log datasets. In particular, loghub provides 19 real-world log datasets collected from a wide range of software systems. Up to the time of this paper writing, the loghub datasets have been downloaded for roughly 90,000 times in total by hundreds of organizations from both industry and academia.
arXiv Detail & Related papers (2020-08-14T16:17:54Z)
Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records. Existing approaches rely on log-specifics or manual rule extraction. We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.