Related papers: End-to-End Automated Logging via Multi-Agent Framework

End-to-End Automated Logging via Multi-Agent Framework

URL: http://arxiv.org/abs/2511.18528v1
Date: Sun, 23 Nov 2025 16:45:30 GMT
Title: End-to-End Automated Logging via Multi-Agent Framework
Authors: Renyi Zhong, Yintong Huo, Wenwei Gu, Yichen Li, Michael R. Lyu,
Abstract summary: Existing automated logging tools often overlook the fundamental whether-to-log decision.<n>We propose Autologger, a novel hybrid framework that addresses the complete end-to-end logging pipeline.<n>In an end-to-end setting, Autologger improves the overall quality of generated logging statements by 16.13% over the strongest baseline.
Score: 35.280199418859034
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Software logging is critical for system observability, yet developers face a dual crisis of costly overlogging and risky underlogging. Existing automated logging tools often overlook the fundamental whether-to-log decision and struggle with the composite nature of logging. In this paper, we propose Autologger, a novel hybrid framework that addresses the complete the end-to-end logging pipeline. Autologger first employs a fine-tuned classifier, the Judger, to accurately determine if a method requires new logging statements. If logging is needed, a multi-agent system is activated. The system includes specialized agents: a Locator dedicated to determining where to log, and a Generator focused on what to log. These agents work together, utilizing our designed program analysis and retrieval tools. We evaluate Autologger on a large corpus from three mature open-source projects against state-of-the-art baselines. Our results show that Autologger achieves 96.63\% F1-score on the crucial whether-to-log decision. In an end-to-end setting, Autologger improves the overall quality of generated logging statements by 16.13\% over the strongest baseline, as measured by an LLM-as-a-judge score. We also demonstrate that our framework is generalizable, consistently boosting the performance of various backbone LLMs.

Related papers

PDLogger: Automated Logging Framework for Practical Software Development [7.860311994179783]
Existing automated logging techniques focus on isolated sub-tasks.<n>PDLogger is the first end-to-end log generation technique expressly designed for practical, multi-log scenarios.<n>It improves log-position precision by 139.0 percent, F1 by 69.2 percent, level accuracy by 82.3 percent, variable precision by 131.8 percent, and message quality (BERTScore) by 65.7 percent.
arXiv Detail & Related papers (2025-07-26T13:35:57Z)
LogLLM: Log-based Anomaly Detection Using Large Language Models [7.7704116297749675]
We propose LogLLM, a log-based anomaly detection framework that leverages large language models (LLMs)<n>LogLLM employs BERT for extracting semantic vectors from log messages, while utilizing Llama, a transformer decoder-based model, for classifying log sequences.<n>Our framework is trained through a novel three-stage procedure designed to enhance performance and adaptability.
arXiv Detail & Related papers (2024-11-13T12:18:00Z)
HELP: Hierarchical Embeddings-based Log Parsing [0.25112747242081457]
Logs are a first-hand source of information for software maintenance and failure diagnosis. Log parsing is a prerequisite for automated log analysis tasks such as anomaly detection, troubleshooting, and root cause analysis. Existing online parsing algorithms are susceptible to log drift, where slight log changes create false positives that drown out real anomalies.
arXiv Detail & Related papers (2024-08-15T17:54:31Z)
LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection [73.69399219776315]
We propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains. Specifically, our model is first pre-trained on the source domain to obtain shared semantic knowledge of log data. Then, we transfer such knowledge to the target domain via shared parameters.
arXiv Detail & Related papers (2024-01-09T12:55:21Z)
A Large-Scale Evaluation for Log Parsing Techniques: How Far Are We? [42.56249610409624]
We provide a new collection of annotated log datasets, denoted Loghub-2.0, which can better reflect the characteristics of log data in real-world software systems. We conduct a thorough re-evaluation of 15 state-of-the-art logs in a more rigorous and practical setting. Particularly, we introduce a new evaluation metric to mitigate the sensitivity of existing metrics to imbalanced data distributions.
arXiv Detail & Related papers (2023-08-21T16:24:15Z)
AutoLog: A Log Sequence Synthesis Framework for Anomaly Detection [34.91789047641838]
AutoLog is the first automated log generation methodology for anomaly detection. It generates run-time log sequences without actually running the system. It propagates the anomaly label to each acquired execution path based on human knowledge.
arXiv Detail & Related papers (2023-08-18T05:56:18Z)
Log Parsing Evaluation in the Era of Modern Software Systems [47.370291246632114]
We focus on one integral part of automated log analysis, log parsing, which is the prerequisite to deriving any insights from logs. Our investigation reveals problematic aspects within the log parsing field, particularly its inefficiency in handling heterogeneous real-world logs. We propose a tool, Logchimera, that enables estimating log parsing performance in industry contexts.
arXiv Detail & Related papers (2023-08-17T14:19:22Z)
Data-Driven Approach for Log Instruction Quality Assessment [59.04636530383049]
There are no widely adopted guidelines on how to write log instructions with good quality properties. We identify two quality properties: 1) correct log level assignment assessing the correctness of the log level, and 2) sufficient linguistic structure assessing the minimal richness of the static text necessary for verbose event description. Our approach correctly assesses log level assignments with an accuracy of 0.88, and the sufficient linguistic structure with an F1 score of 0.99, outperforming the baselines.
arXiv Detail & Related papers (2022-04-06T07:02:23Z)
LogLAB: Attention-Based Labeling of Log Data Anomalies via Weak Supervision [63.08516384181491]
We present LogLAB, a novel modeling approach for automated labeling of log messages without requiring manual work by experts. Our method relies on estimated failure time windows provided by monitoring systems to produce precise labeled datasets in retrospect. Our evaluation shows that LogLAB consistently outperforms nine benchmark approaches across three different datasets and maintains an F1-score of more than 0.98 even at large failure time windows.
arXiv Detail & Related papers (2021-11-02T15:16:08Z)
On Automatic Parsing of Log Records [0.0]
We create a tool that generates synthetic Apache log records which we used to train recurrent-neural-network-based MT models. Models' evaluation on real-world logs shows that the models can learn Apache log format and parse individual log records.
arXiv Detail & Related papers (2021-02-12T00:27:41Z)
Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records. Existing approaches rely on log-specifics or manual rule extraction. We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.