Related papers: AL-Bench: A Benchmark for Automatic Logging

AL-Bench: A Benchmark for Automatic Logging

URL: http://arxiv.org/abs/2502.03160v3
Date: Wed, 02 Apr 2025 04:13:04 GMT
Title: AL-Bench: A Benchmark for Automatic Logging
Authors: Boyin Tan, Junjielong Xu, Zhouruixing Zhu, Pinjia He,
Abstract summary: We introduce AL-Bench, a benchmark designed specifically for automatic logging tools.<n> AL-Bench includes a large-scale, high-quality, diverse dataset collected from 10 widely recognized projects.<n>It provides a run-time perspective of logging quality in addition to the traditional static evaluation at source code level.
Score: 3.8293110324859505
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Logging, the practice of inserting log statements into source code, is critical for improving software reliability. Recently, language model-based techniques have been developed to automate log statement generation based on input code. While these tools show promising results in prior studies, the fairness of their results comparisons is not guaranteed due to the use of ad hoc datasets. In addition, existing evaluation approaches exclusively dependent on code similarity metrics fail to capture the impact of code diff on runtime logging behavior, as minor code modifications can induce program uncompilable and substantial discrepancies in log output semantics. To enhance the consistency and reproducibility of logging evaluation, we introduce AL-Bench, a comprehensive benchmark designed specifically for automatic logging tools. AL-Bench includes a large-scale, high-quality, diverse dataset collected from 10 widely recognized projects with varying logging requirements. Moreover, it introduces a novel dynamic evaluation methodology to provide a run-time perspective of logging quality in addition to the traditional static evaluation at source code level. Specifically, AL-Bench not only evaluates the similarity between the oracle and predicted log statements in source code, but also evaluates the difference between the log files printed by both log statements during runtime. AL-Bench reveals significant limitations in existing static evaluation, as all logging tools show average accuracy drops of 37.49%, 23.43%, and 15.80% in predicting log position, level, and message compared to their reported results. Furthermore, with dynamic evaluation, AL-Bench reveals that 20.1%-83.6% of these generated log statements are unable to compile. Moreover, the best-performing tool achieves only 21.32% cosine similarity between the log files of the oracle and generated log statements.

Related papers

AlgoVeri: An Aligned Benchmark for Verified Code Generation on Classical Algorithms [54.99368693313797]
Existing benchmarks test only individual languages/tools, so the performance numbers are not directly comparable.<n>We address this gap with AlgoVeri, a benchmark that evaluates vericoding of $77$ classical algorithms in Dafny, Verus, and Lean.
arXiv Detail & Related papers (2026-02-10T06:58:26Z)
PDLogger: Automated Logging Framework for Practical Software Development [7.860311994179783]
Existing automated logging techniques focus on isolated sub-tasks.<n>PDLogger is the first end-to-end log generation technique expressly designed for practical, multi-log scenarios.<n>It improves log-position precision by 139.0 percent, F1 by 69.2 percent, level accuracy by 82.3 percent, variable precision by 131.8 percent, and message quality (BERTScore) by 65.7 percent.
arXiv Detail & Related papers (2025-07-26T13:35:57Z)
Go Static: Contextualized Logging Statement Generation [38.15795803230719]
SCLogger is a contextualized logging statement generation approach with inter-method static contexts. SCLogger surpasses the state-of-the-art approach by 8.7% in logging position accuracy, 32.1% in level accuracy, 19.6% in variable precision, and 138.4% in text BLEU-4 score.
arXiv Detail & Related papers (2024-02-20T12:22:59Z)
LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection [73.69399219776315]
We propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains. Specifically, our model is first pre-trained on the source domain to obtain shared semantic knowledge of log data. Then, we transfer such knowledge to the target domain via shared parameters.
arXiv Detail & Related papers (2024-01-09T12:55:21Z)
FastLog: An End-to-End Method to Efficiently Generate and Insert Logging Statements [5.80502312468937]
We propose FastLog, which can support the complete logging statement generation and insertion activity. FastLog first predicts the insertion position in the finest token level, and then generates a complete logging statement to insert. A comprehensive empirical analysis shows that our method outperforms the state-of-the-art approach in both efficiency and output quality.
arXiv Detail & Related papers (2023-11-06T04:26:20Z)
A Large-Scale Evaluation for Log Parsing Techniques: How Far Are We? [42.56249610409624]
We provide a new collection of annotated log datasets, denoted Loghub-2.0, which can better reflect the characteristics of log data in real-world software systems. We conduct a thorough re-evaluation of 15 state-of-the-art logs in a more rigorous and practical setting. Particularly, we introduce a new evaluation metric to mitigate the sensitivity of existing metrics to imbalanced data distributions.
arXiv Detail & Related papers (2023-08-21T16:24:15Z)
Log Parsing Evaluation in the Era of Modern Software Systems [47.370291246632114]
We focus on one integral part of automated log analysis, log parsing, which is the prerequisite to deriving any insights from logs. Our investigation reveals problematic aspects within the log parsing field, particularly its inefficiency in handling heterogeneous real-world logs. We propose a tool, Logchimera, that enables estimating log parsing performance in industry contexts.
arXiv Detail & Related papers (2023-08-17T14:19:22Z)
SKTR: Trace Recovery from Stochastically Known Logs [7.882975068446842]
Developments in machine learning together with the increasing usage of sensor data challenge the reliance on deterministic logs. In this work we formulate the task of generating a deterministic log fromally known logs that is as faithful to reality as possible. An effective trace recovery algorithm would be a powerful aid for maintaining credible process mining tools for uncertain settings.
arXiv Detail & Related papers (2022-06-25T15:29:20Z)
Data-Driven Approach for Log Instruction Quality Assessment [59.04636530383049]
There are no widely adopted guidelines on how to write log instructions with good quality properties. We identify two quality properties: 1) correct log level assignment assessing the correctness of the log level, and 2) sufficient linguistic structure assessing the minimal richness of the static text necessary for verbose event description. Our approach correctly assesses log level assignments with an accuracy of 0.88, and the sufficient linguistic structure with an F1 score of 0.99, outperforming the baselines.
arXiv Detail & Related papers (2022-04-06T07:02:23Z)
Borrowing from Similar Code: A Deep Learning NLP-Based Approach for Log Statement Automation [0.0]
We introduce an updated and improved log-aware code-clone detection method to predict the location of logging statements. We incorporate natural language processing (NLP) and deep learning methods to automate the log statements' description prediction. Our analysis shows that our hybrid NLP and code-clone detection approach (NLP CC'd) outperforms conventional clone detectors in finding log statement locations.
arXiv Detail & Related papers (2021-12-02T14:03:49Z)
LogLAB: Attention-Based Labeling of Log Data Anomalies via Weak Supervision [63.08516384181491]
We present LogLAB, a novel modeling approach for automated labeling of log messages without requiring manual work by experts. Our method relies on estimated failure time windows provided by monitoring systems to produce precise labeled datasets in retrospect. Our evaluation shows that LogLAB consistently outperforms nine benchmark approaches across three different datasets and maintains an F1-score of more than 0.98 even at large failure time windows.
arXiv Detail & Related papers (2021-11-02T15:16:08Z)
Leveraging Code Clones and Natural Language Processing for Log Statement Prediction [0.0]
This research aims to predict the log statements by utilizing source code clones and natural language processing (NLP) Our work demonstrates the effectiveness of log-aware clone detection for automated log location and description prediction.
arXiv Detail & Related papers (2021-09-08T18:17:45Z)
Self-Attentive Classification-Based Anomaly Detection in Unstructured Logs [59.04636530383049]
We propose Logsy, a classification-based method to learn log representations. We show an average improvement of 0.25 in the F1 score, compared to the previous methods.
arXiv Detail & Related papers (2020-08-21T07:26:55Z)
Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records. Existing approaches rely on log-specifics or manual rule extraction. We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.