Hue: A User-Adaptive Parser for Hybrid Logs
- URL: http://arxiv.org/abs/2308.07085v1
- Date: Mon, 14 Aug 2023 11:28:50 GMT
- Title: Hue: A User-Adaptive Parser for Hybrid Logs
- Authors: Junjielong Xu, Qiuai Fu, Zhouruixing Zhu, Yutong Cheng, Zhijing Li,
Yuchi Ma, Pinjia He
- Abstract summary: Hue converts each log message to a sequence of special wildcards using a key casting table.
Hue can effectively utilize user feedback via a novel merge-reject strategy.
Hue has been successfully deployed in a real production environment for daily hybrid log parsing.
- Score: 4.783349633632601
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Log parsing, which extracts log templates from semi-structured logs and
produces structured logs, is the first and the most critical step in automated
log analysis. While existing log parsers have achieved decent results, they
suffer from two major limitations by design. First, they do not natively
support hybrid logs that consist of both single-line logs and multi-line logs
(\eg Java Exception and Hadoop Counters). Second, they fall short in
integrating domain knowledge in parsing, making it hard to identify ambiguous
tokens in logs. This paper defines a new research problem, \textit{hybrid log
parsing}, as a superset of traditional log parsing tasks, and proposes
\textit{Hue}, the first attempt for hybrid log parsing via a user-adaptive
manner. Specifically, Hue converts each log message to a sequence of special
wildcards using a key casting table and determines the log types via line
aggregating and pattern extracting. In addition, Hue can effectively utilize
user feedback via a novel merge-reject strategy, making it possible to quickly
adapt to complex and changing log templates. We evaluated Hue on three hybrid
log datasets and sixteen widely-used single-line log datasets (\ie Loghub). The
results show that Hue achieves an average grouping accuracy of 0.845 on hybrid
logs, which largely outperforms the best results (0.563 on average) obtained by
existing parsers. Hue also exhibits SOTA performance on single-line log
datasets. Furthermore, Hue has been successfully deployed in a real production
environment for daily hybrid log parsing.
Related papers
- HELP: Hierarchical Embeddings-based Log Parsing [0.25112747242081457]
Logs are a first-hand source of information for software maintenance and failure diagnosis.
Log parsing is a prerequisite for automated log analysis tasks such as anomaly detection, troubleshooting, and root cause analysis.
Existing online parsing algorithms are susceptible to log drift, where slight log changes create false positives that drown out real anomalies.
arXiv Detail & Related papers (2024-08-15T17:54:31Z) - Stronger, Cheaper and Demonstration-Free Log Parsing with LLMs [18.240096266464544]
We propose LogBatcher, a cost-effective LLM-based log that requires no training process or labeled data.
We have conducted experiments on 16 public log datasets and the results show that LogBatcher is effective for log parsing.
arXiv Detail & Related papers (2024-06-10T10:39:28Z) - LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection [73.69399219776315]
We propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains.
Specifically, our model is first pre-trained on the source domain to obtain shared semantic knowledge of log data.
Then, we transfer such knowledge to the target domain via shared parameters.
arXiv Detail & Related papers (2024-01-09T12:55:21Z) - A Large-Scale Evaluation for Log Parsing Techniques: How Far Are We? [42.56249610409624]
We provide a new collection of annotated log datasets, denoted Loghub-2.0, which can better reflect the characteristics of log data in real-world software systems.
We conduct a thorough re-evaluation of 15 state-of-the-art logs in a more rigorous and practical setting. Particularly, we introduce a new evaluation metric to mitigate the sensitivity of existing metrics to imbalanced data distributions.
arXiv Detail & Related papers (2023-08-21T16:24:15Z) - Log Parsing Evaluation in the Era of Modern Software Systems [47.370291246632114]
We focus on one integral part of automated log analysis, log parsing, which is the prerequisite to deriving any insights from logs.
Our investigation reveals problematic aspects within the log parsing field, particularly its inefficiency in handling heterogeneous real-world logs.
We propose a tool, Logchimera, that enables estimating log parsing performance in industry contexts.
arXiv Detail & Related papers (2023-08-17T14:19:22Z) - Prompting for Automatic Log Template Extraction [6.299547112893045]
DivLog is an effective log parsing framework based on the incontext learning (ICL) ability of large language models (LLMs)
By mining the semantics of examples in the prompt, DivLog generates a target log template in a training-free manner.
arXiv Detail & Related papers (2023-07-19T12:44:59Z) - MURMUR: Modular Multi-Step Reasoning for Semi-Structured Data-to-Text
Generation [102.20036684996248]
We propose MURMUR, a neuro-symbolic modular approach to text generation from semi-structured data with multi-step reasoning.
We conduct experiments on two data-to-text generation tasks like WebNLG and LogicNLG.
arXiv Detail & Related papers (2022-12-16T17:36:23Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - On Automatic Parsing of Log Records [0.0]
We create a tool that generates synthetic Apache log records which we used to train recurrent-neural-network-based MT models.
Models' evaluation on real-world logs shows that the models can learn Apache log format and parse individual log records.
arXiv Detail & Related papers (2021-02-12T00:27:41Z) - Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.