Related papers: LogUpdater: Automated Detection and Repair of Specific Defects in Logging Statements

LogUpdater: Automated Detection and Repair of Specific Defects in Logging Statements

URL: http://arxiv.org/abs/2408.03101v2
Date: Tue, 22 Apr 2025 11:58:54 GMT
Title: LogUpdater: Automated Detection and Repair of Specific Defects in Logging Statements
Authors: Renyi Zhong, Yichen Li, Jinxi Kuang, Wenwei Gu, Yintong Huo, Michael R. Lyu,
Abstract summary: Developers use logging statements to track software runtime behaviors and system status.<n>Unclear or misleading logs can hide true execution patterns and hinder software maintenance.<n>We conduct a study to identify four logging statement defect types by analyzing log-centric changes.<n>We introduce LogUpdater, a framework for automatically detecting and updating these log defects.
Score: 29.631530836349505
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Developers use logging statements to track software runtime behaviors and system status. Yet, unclear or misleading logs can hide true execution patterns and hinder software maintenance. Current research on logging statement issues is limited, often only spotting one defect type and relying on manual corrections instead of automation. To bridge this gap, we conduct a study to identify four logging statement defect types by analyzing log-centric changes. Then we introduce LogUpdater, a two-stage framework for automatically detecting and updating these log defects. In the offline phase, LogUpdater builds a classifier using synthetic defective logs to spot defect types. During online testing, this classifier assesses if and how logs in code snippets need improvement. LogUpdater then uses type-aware prompts from past logging updates to suggest fixes via a recommendation framework based on LLMs. Results show strong defect detection with an F1 score of 0.625. It also greatly improves static text and dynamic variable suggestions by 48.12% and 24.90%, respectively. LogUpdater successfully recommends updates 61.49% of the time on new projects. We reported 40 problematic logs and their fixes on GitHub, leading to 25 merged changes across 11 projects.

Related papers

Demystifying and Extracting Fault-indicating Information from Logs for Failure Diagnosis [29.800380941293277]
Engineers prioritize two categories of log information for diagnosis: fault-indicating descriptions and fault-indicating parameters. We propose an approach to automatically extract faultindicating information from logs for fault diagnosis, named LoFI. LoFI outperforms all baseline methods by a significant margin, achieving an absolute improvement of 25.837.9 in F1 over the best baseline method, ChatGPT.
arXiv Detail & Related papers (2024-09-20T15:00:47Z)
LogParser-LLM: Advancing Efficient Log Parsing with Large Language Models [19.657278472819588]
We introduce Log-LLM, a novel log integrated with LLM capabilities. We address the intricate challenge of parsing granularity, proposing a new metric to allow users to calibrate granularity to their specific needs. Our method's efficacy is empirically demonstrated through evaluations on the Loghub-2k and the large-scale LogPub benchmark.
arXiv Detail & Related papers (2024-08-25T05:34:24Z)
HELP: Hierarchical Embeddings-based Log Parsing [0.25112747242081457]
Logs are a first-hand source of information for software maintenance and failure diagnosis. Log parsing is a prerequisite for automated log analysis tasks such as anomaly detection, troubleshooting, and root cause analysis. Existing online parsing algorithms are susceptible to log drift, where slight log changes create false positives that drown out real anomalies.
arXiv Detail & Related papers (2024-08-15T17:54:31Z)
Easy over Hard: A Simple Baseline for Test Failures Causes Prediction [13.759493107661834]
NCChecker is a tool to automatically identify the failure causes for failed test logs. Our approach has three main stages: log abstraction, lookup table construction, and failure causes prediction. We have developed a prototype and evaluated our tool on a real-world industrial dataset with more than 10K test logs.
arXiv Detail & Related papers (2024-05-05T12:59:37Z)
Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework [50.02710905062184]
This paper proposes an automated design-data augmentation framework, which generates high-volume and high-quality natural language aligned with Verilog and EDA scripts. The accuracy of Verilog generation surpasses that of the current state-of-the-art open-source Verilog generation model, increasing from 58.8% to 70.6% with the same benchmark.
arXiv Detail & Related papers (2024-03-17T13:01:03Z)
LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection [73.69399219776315]
We propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains. Specifically, our model is first pre-trained on the source domain to obtain shared semantic knowledge of log data. Then, we transfer such knowledge to the target domain via shared parameters.
arXiv Detail & Related papers (2024-01-09T12:55:21Z)
A Large-Scale Evaluation for Log Parsing Techniques: How Far Are We? [42.56249610409624]
We provide a new collection of annotated log datasets, denoted Loghub-2.0, which can better reflect the characteristics of log data in real-world software systems. We conduct a thorough re-evaluation of 15 state-of-the-art logs in a more rigorous and practical setting. Particularly, we introduce a new evaluation metric to mitigate the sensitivity of existing metrics to imbalanced data distributions.
arXiv Detail & Related papers (2023-08-21T16:24:15Z)
AutoLog: A Log Sequence Synthesis Framework for Anomaly Detection [34.91789047641838]
AutoLog is the first automated log generation methodology for anomaly detection. It generates run-time log sequences without actually running the system. It propagates the anomaly label to each acquired execution path based on human knowledge.
arXiv Detail & Related papers (2023-08-18T05:56:18Z)
EvLog: Identifying Anomalous Logs over Software Evolution [31.46106509190191]
We propose a novel unsupervised approach named Evolving Log extractor (EvLog) to process logs without parsing. EvLog implements an anomaly discriminator with an attention mechanism to identify the anomalous logs and avoid the issue brought by the unstable sequence. EvLog has shown effectiveness in two real-world system evolution log datasets with an average F1 score of 0.955 and 0.847 in the intra-version setting and inter-version setting, respectively.
arXiv Detail & Related papers (2023-06-02T12:58:00Z)
Data-Driven Approach for Log Instruction Quality Assessment [59.04636530383049]
There are no widely adopted guidelines on how to write log instructions with good quality properties. We identify two quality properties: 1) correct log level assignment assessing the correctness of the log level, and 2) sufficient linguistic structure assessing the minimal richness of the static text necessary for verbose event description. Our approach correctly assesses log level assignments with an accuracy of 0.88, and the sufficient linguistic structure with an F1 score of 0.99, outperforming the baselines.
arXiv Detail & Related papers (2022-04-06T07:02:23Z)
Borrowing from Similar Code: A Deep Learning NLP-Based Approach for Log Statement Automation [0.0]
We introduce an updated and improved log-aware code-clone detection method to predict the location of logging statements. We incorporate natural language processing (NLP) and deep learning methods to automate the log statements' description prediction. Our analysis shows that our hybrid NLP and code-clone detection approach (NLP CC'd) outperforms conventional clone detectors in finding log statement locations.
arXiv Detail & Related papers (2021-12-02T14:03:49Z)
LogLAB: Attention-Based Labeling of Log Data Anomalies via Weak Supervision [63.08516384181491]
We present LogLAB, a novel modeling approach for automated labeling of log messages without requiring manual work by experts. Our method relies on estimated failure time windows provided by monitoring systems to produce precise labeled datasets in retrospect. Our evaluation shows that LogLAB consistently outperforms nine benchmark approaches across three different datasets and maintains an F1-score of more than 0.98 even at large failure time windows.
arXiv Detail & Related papers (2021-11-02T15:16:08Z)
Robust and Transferable Anomaly Detection in Log Data using Pre-Trained Language Models [59.04636530383049]
Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users. We propose a framework for anomaly detection in log data, as a major troubleshooting source of system information.
arXiv Detail & Related papers (2021-02-23T09:17:05Z)
Self-Attentive Classification-Based Anomaly Detection in Unstructured Logs [59.04636530383049]
We propose Logsy, a classification-based method to learn log representations. We show an average improvement of 0.25 in the F1 score, compared to the previous methods.
arXiv Detail & Related papers (2020-08-21T07:26:55Z)
Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records. Existing approaches rely on log-specifics or manual rule extraction. We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.