Prompting for Automatic Log Template Extraction
- URL: http://arxiv.org/abs/2307.09950v3
- Date: Thu, 29 Feb 2024 09:33:13 GMT
- Title: Prompting for Automatic Log Template Extraction
- Authors: Junjielong Xu, Ruichun Yang, Yintong Huo, Chengyu Zhang, and Pinjia He
- Abstract summary: DivLog is an effective log parsing framework based on the incontext learning (ICL) ability of large language models (LLMs)
By mining the semantics of examples in the prompt, DivLog generates a target log template in a training-free manner.
- Score: 6.299547112893045
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Log parsing, which involves log template extraction from semi-structured logs
to produce structured logs, is the first and the most critical step in
automated log analysis. However, current log parsers suffer from limited
effectiveness for two reasons. First, traditional data-driven log parsers
solely rely on heuristics or handcrafted features designed by domain experts,
which may not consistently perform well on logs from diverse systems. Second,
existing supervised log parsers require model tuning, which is often limited to
fixed training samples and causes sub-optimal performance across the entire log
source. To address this limitation, we propose DivLog, an effective log parsing
framework based on the in-context learning (ICL) ability of large language
models (LLMs). Specifically, before log parsing, DivLog samples a small amount
of offline logs as candidates by maximizing their diversity. Then, during log
parsing, DivLog selects five appropriate labeled candidates as examples for
each target log and constructs them into a prompt. By mining the semantics of
examples in the prompt, DivLog generates a target log template in a
training-free manner. In addition, we design a straightforward yet effective
prompt format to extract the output and enhance the quality of the generated
log templates. We conducted experiments on 16 widely-used public datasets. The
results show that DivLog achieves (1) 98.1% Parsing Accuracy, (2) 92.1%
Precision Template Accuracy, and (3) 92.9% Recall Template Accuracy on average,
exhibiting state-of-the-art performance.
Related papers
- HELP: Hierarchical Embeddings-based Log Parsing [0.25112747242081457]
Logs are a first-hand source of information for software maintenance and failure diagnosis.
Log parsing is a prerequisite for automated log analysis tasks such as anomaly detection, troubleshooting, and root cause analysis.
Existing online parsing algorithms are susceptible to log drift, where slight log changes create false positives that drown out real anomalies.
arXiv Detail & Related papers (2024-08-15T17:54:31Z) - Stronger, Cheaper and Demonstration-Free Log Parsing with LLMs [18.240096266464544]
We propose LogBatcher, a cost-effective LLM-based log that requires no training process or labeled data.
We have conducted experiments on 16 public log datasets and the results show that LogBatcher is effective for log parsing.
arXiv Detail & Related papers (2024-06-10T10:39:28Z) - Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework [50.02710905062184]
This paper proposes an automated design-data augmentation framework, which generates high-volume and high-quality natural language aligned with Verilog and EDA scripts.
The accuracy of Verilog generation surpasses that of the current state-of-the-art open-source Verilog generation model, increasing from 58.8% to 70.6% with the same benchmark.
arXiv Detail & Related papers (2024-03-17T13:01:03Z) - LogPTR: Variable-Aware Log Parsing with Pointer Network [26.22475002474724]
We propose LogPTR, the first end-to-end variable-aware log that can extract the static and dynamic parts in logs, and identify the categories of variables.
We have performed extensive experiments on 16 public log datasets and the results show that LogPTR outperforms state-of-the-art log parsing.
arXiv Detail & Related papers (2024-01-11T15:41:21Z) - LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection [73.69399219776315]
We propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains.
Specifically, our model is first pre-trained on the source domain to obtain shared semantic knowledge of log data.
Then, we transfer such knowledge to the target domain via shared parameters.
arXiv Detail & Related papers (2024-01-09T12:55:21Z) - A Large-Scale Evaluation for Log Parsing Techniques: How Far Are We? [42.56249610409624]
We provide a new collection of annotated log datasets, denoted Loghub-2.0, which can better reflect the characteristics of log data in real-world software systems.
We conduct a thorough re-evaluation of 15 state-of-the-art logs in a more rigorous and practical setting. Particularly, we introduce a new evaluation metric to mitigate the sensitivity of existing metrics to imbalanced data distributions.
arXiv Detail & Related papers (2023-08-21T16:24:15Z) - Log Parsing Evaluation in the Era of Modern Software Systems [47.370291246632114]
We focus on one integral part of automated log analysis, log parsing, which is the prerequisite to deriving any insights from logs.
Our investigation reveals problematic aspects within the log parsing field, particularly its inefficiency in handling heterogeneous real-world logs.
We propose a tool, Logchimera, that enables estimating log parsing performance in industry contexts.
arXiv Detail & Related papers (2023-08-17T14:19:22Z) - Data-Driven Approach for Log Instruction Quality Assessment [59.04636530383049]
There are no widely adopted guidelines on how to write log instructions with good quality properties.
We identify two quality properties: 1) correct log level assignment assessing the correctness of the log level, and 2) sufficient linguistic structure assessing the minimal richness of the static text necessary for verbose event description.
Our approach correctly assesses log level assignments with an accuracy of 0.88, and the sufficient linguistic structure with an F1 score of 0.99, outperforming the baselines.
arXiv Detail & Related papers (2022-04-06T07:02:23Z) - Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.