Lemur: Log Parsing with Entropy Sampling and Chain-of-Thought Merging
- URL: http://arxiv.org/abs/2402.18205v2
- Date: Sat, 2 Mar 2024 03:47:13 GMT
- Title: Lemur: Log Parsing with Entropy Sampling and Chain-of-Thought Merging
- Authors: Wei Zhang, Hongcheng Guo, Anjie Le, Jian Yang, Jiaheng Liu, Zhoujun
Li, Tieqiao Zheng, Shi Xu, Runqiang Zang, Liangfan Zheng, Bo Zhang
- Abstract summary: We introduce a cutting-edge textbfLog parsing framework with textbfEntropy sampling and Chain-of-Thought textbfMerging (Lemur)
We propose a novel sampling method inspired by information entropy, which efficiently clusters typical logs.
Lemur achieves the state-of-the-art performance and impressive efficiency.
- Score: 33.522495018321386
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Logs produced by extensive software systems are integral to monitoring system
behaviors. Advanced log analysis facilitates the detection, alerting, and
diagnosis of system faults. Log parsing, which entails transforming raw log
messages into structured templates, constitutes a critical phase in the
automation of log analytics. Existing log parsers fail to identify the correct
templates due to reliance on human-made rules. Besides, These methods focus on
statistical features while ignoring semantic information in log messages. To
address these challenges, we introduce a cutting-edge \textbf{L}og parsing
framework with \textbf{E}ntropy sampling and Chain-of-Thought \textbf{M}erging
(Lemur). Specifically, to discard the tedious manual rules. We propose a novel
sampling method inspired by information entropy, which efficiently clusters
typical logs. Furthermore, to enhance the merging of log templates, we design a
chain-of-thought method for large language models (LLMs). LLMs exhibit
exceptional semantic comprehension, deftly distinguishing between parameters
and invariant tokens. We have conducted experiments on large-scale public
datasets. Extensive evaluation demonstrates that Lemur achieves the
state-of-the-art performance and impressive efficiency.
Related papers
- LogLLM: Log-based Anomaly Detection Using Large Language Models [8.03646578793411]
We propose LogLLM, a log-based anomaly detection framework that leverages large language models (LLMs)
LogLLM employs BERT for extracting semantic vectors from log messages, while utilizing Llama, a transformer decoder-based model, for classifying log sequences.
Our framework is trained through a novel three-stage procedure designed to enhance performance and adaptability.
arXiv Detail & Related papers (2024-11-13T12:18:00Z) - LogParser-LLM: Advancing Efficient Log Parsing with Large Language Models [19.657278472819588]
We introduce Log-LLM, a novel log integrated with LLM capabilities.
We address the intricate challenge of parsing granularity, proposing a new metric to allow users to calibrate granularity to their specific needs.
Our method's efficacy is empirically demonstrated through evaluations on the Loghub-2k and the large-scale LogPub benchmark.
arXiv Detail & Related papers (2024-08-25T05:34:24Z) - HELP: Hierarchical Embeddings-based Log Parsing [0.25112747242081457]
Logs are a first-hand source of information for software maintenance and failure diagnosis.
Log parsing is a prerequisite for automated log analysis tasks such as anomaly detection, troubleshooting, and root cause analysis.
Existing online parsing algorithms are susceptible to log drift, where slight log changes create false positives that drown out real anomalies.
arXiv Detail & Related papers (2024-08-15T17:54:31Z) - LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection [73.69399219776315]
We propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains.
Specifically, our model is first pre-trained on the source domain to obtain shared semantic knowledge of log data.
Then, we transfer such knowledge to the target domain via shared parameters.
arXiv Detail & Related papers (2024-01-09T12:55:21Z) - GLAD: Content-aware Dynamic Graphs For Log Anomaly Detection [49.9884374409624]
GLAD is a Graph-based Log Anomaly Detection framework designed to detect anomalies in system logs.
We introduce GLAD, a Graph-based Log Anomaly Detection framework designed to detect anomalies in system logs.
arXiv Detail & Related papers (2023-09-12T04:21:30Z) - MURMUR: Modular Multi-Step Reasoning for Semi-Structured Data-to-Text
Generation [102.20036684996248]
We propose MURMUR, a neuro-symbolic modular approach to text generation from semi-structured data with multi-step reasoning.
We conduct experiments on two data-to-text generation tasks like WebNLG and LogicNLG.
arXiv Detail & Related papers (2022-12-16T17:36:23Z) - LogLG: Weakly Supervised Log Anomaly Detection via Log-Event Graph
Construction [31.31712326361932]
We propose a novel weakly supervised log anomaly detection framework, named LogLG, to explore the semantic connections among keywords from sequences.
Specifically, we design an end-to-end iterative process, where the keywords of unlabeled logs are first extracted to construct a log-event graph.
Then, we build a subgraph annotator to generate pseudo labels for unlabeled log sequences.
arXiv Detail & Related papers (2022-08-23T09:32:19Z) - Robust and Transferable Anomaly Detection in Log Data using Pre-Trained
Language Models [59.04636530383049]
Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users.
We propose a framework for anomaly detection in log data, as a major troubleshooting source of system information.
arXiv Detail & Related papers (2021-02-23T09:17:05Z) - Self-Attentive Classification-Based Anomaly Detection in Unstructured
Logs [59.04636530383049]
We propose Logsy, a classification-based method to learn log representations.
We show an average improvement of 0.25 in the F1 score, compared to the previous methods.
arXiv Detail & Related papers (2020-08-21T07:26:55Z) - Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.