LibreLog: Accurate and Efficient Unsupervised Log Parsing Using Open-Source Large Language Models
- URL: http://arxiv.org/abs/2408.01585v3
- Date: Mon, 18 Nov 2024 05:18:51 GMT
- Title: LibreLog: Accurate and Efficient Unsupervised Log Parsing Using Open-Source Large Language Models
- Authors: Zeyang Ma, Dong Jae Kim, Tse-Hsun Chen,
- Abstract summary: This paper introduces LibreLog, an unsupervised log parsing approach that enhances privacy and reduces operational costs while achieving state-of-the-art parsing accuracy.
Our evaluation on LogHub-2.0 shows that LibreLog achieves 25% higher parsing accuracy and processes 2.7 times faster compared to state-of-the-art LLMs.
- Score: 3.7960472831772774
- License:
- Abstract: Log parsing is a critical step that transforms unstructured log data into structured formats, facilitating subsequent log-based analysis. Traditional syntax-based log parsers are efficient and effective, but they often experience decreased accuracy when processing logs that deviate from the predefined rules. Recently, large language models (LLM) based log parsers have shown superior parsing accuracy. However, existing LLM-based parsers face three main challenges: 1)time-consuming and labor-intensive manual labeling for fine-tuning or in-context learning, 2)increased parsing costs due to the vast volume of log data and limited context size of LLMs, and 3)privacy risks from using commercial models like ChatGPT with sensitive log information. To overcome these limitations, this paper introduces LibreLog, an unsupervised log parsing approach that leverages open-source LLMs (i.e., Llama3-8B) to enhance privacy and reduce operational costs while achieving state-of-the-art parsing accuracy. LibreLog first groups logs with similar static text but varying dynamic variables using a fixed-depth grouping tree. It then parses logs within these groups using three components: i)similarity scoring-based retrieval augmented generation: selects diverse logs within each group based on Jaccard similarity, helping the LLM distinguish between static text and dynamic variables; ii)self-reflection: iteratively query LLMs to refine log templates to improve parsing accuracy; and iii) log template memory: stores parsed templates to reduce LLM queries for improved parsing efficiency. Our evaluation on LogHub-2.0 shows that LibreLog achieves 25% higher parsing accuracy and processes logs 2.7 times faster compared to state-of-the-art LLM-based parsers. In short, LibreLog addresses privacy and cost concerns of using commercial LLMs while achieving state-of-the-arts parsing efficiency and accuracy.
Related papers
- Studying and Benchmarking Large Language Models For Log Level Suggestion [49.176736212364496]
Large Language Models (LLMs) have become a focal point of research across various domains.
This paper investigates the impact of characteristics and learning paradigms on the performance of 12 open-source LLMs in log level suggestion.
arXiv Detail & Related papers (2024-10-11T03:52:17Z) - HELP: Hierarchical Embeddings-based Log Parsing [0.25112747242081457]
Logs are a first-hand source of information for software maintenance and failure diagnosis.
Log parsing is a prerequisite for automated log analysis tasks such as anomaly detection, troubleshooting, and root cause analysis.
Existing online parsing algorithms are susceptible to log drift, where slight log changes create false positives that drown out real anomalies.
arXiv Detail & Related papers (2024-08-15T17:54:31Z) - LUNAR: Unsupervised LLM-based Log Parsing [34.344687402936835]
We propose LUNAR, an unsupervised-based method for efficient and off-the-shelf log parsing.
Our key insight is that while LLMs may struggle with direct log parsing, their performance can be significantly enhanced through comparative analysis.
Experiments on large-scale public datasets demonstrate that LUNAR significantly outperforms state-of-the-art log crafts in terms of accuracy and efficiency.
arXiv Detail & Related papers (2024-06-11T11:32:01Z) - Stronger, Cheaper and Demonstration-Free Log Parsing with LLMs [18.240096266464544]
We propose LogBatcher, a cost-effective LLM-based log that requires no training process or labeled data.
We have conducted experiments on 16 public log datasets and the results show that LogBatcher is effective for log parsing.
arXiv Detail & Related papers (2024-06-10T10:39:28Z) - Log Parsing with Self-Generated In-Context Learning and Self-Correction [15.93927602769091]
Despite a variety of log parsing methods that have been proposed, their performance on evolving log data remains unsatisfactory due to reliance on human-crafted rules or learning-based models with limited training data.
We propose Ada, an effective and adaptive log parsing framework using LLMs with self-generated in-context learning (SG-ICL) and self-correction.
arXiv Detail & Related papers (2024-06-05T15:31:43Z) - Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework [50.02710905062184]
This paper proposes an automated design-data augmentation framework, which generates high-volume and high-quality natural language aligned with Verilog and EDA scripts.
The accuracy of Verilog generation surpasses that of the current state-of-the-art open-source Verilog generation model, increasing from 58.8% to 70.6% with the same benchmark.
arXiv Detail & Related papers (2024-03-17T13:01:03Z) - Optimizing LLM Queries in Relational Workloads [58.254894049950366]
We show how to optimize Large Language Models (LLMs) inference for analytical workloads that invoke LLMs within relational queries.
We implement these optimizations in Apache Spark, with vLLM as the model serving backend.
We achieve up to 4.4x improvement in end-to-end latency on a benchmark of diverse LLM-based queries on real datasets.
arXiv Detail & Related papers (2024-03-09T07:01:44Z) - LILAC: Log Parsing using LLMs with Adaptive Parsing Cache [38.04960745458878]
We propose LILAC, the first practical log parsing framework using large language models (LLMs) with adaptive parsing cache.
LLMs's lack of specialized log parsing capabilities currently hinders their accuracy in parsing.
We show LILAC outperforms state-of-the-art methods by 69.5% in terms of the average F1 score of template accuracy.
arXiv Detail & Related papers (2023-10-03T04:46:59Z) - MURMUR: Modular Multi-Step Reasoning for Semi-Structured Data-to-Text
Generation [102.20036684996248]
We propose MURMUR, a neuro-symbolic modular approach to text generation from semi-structured data with multi-step reasoning.
We conduct experiments on two data-to-text generation tasks like WebNLG and LogicNLG.
arXiv Detail & Related papers (2022-12-16T17:36:23Z) - Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.