Token Interdependency Parsing (Tipping) -- Fast and Accurate Log Parsing
- URL: http://arxiv.org/abs/2408.00645v1
- Date: Thu, 1 Aug 2024 15:37:22 GMT
- Title: Token Interdependency Parsing (Tipping) -- Fast and Accurate Log Parsing
- Authors: Shayan Hashemi, Mika Mäntylä,
- Abstract summary: Most automated analysis tools include a component designed to separate log templates from their parameters, commonly referred to as a "log"
"Tipping" combines rule-based tokenizers, interdependency token graphs, strongly connected components, and various techniques to ensure rapid, scalable, and precise log parsing.
Tipping can parse 11 million lines of logs in less than 20 seconds on a laptop machine.
- Score: 0.09208007322096533
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the last decade, an impressive increase in software adaptions has led to a surge in log data production, making manual log analysis impractical and establishing the necessity for automated methods. Conversely, most automated analysis tools include a component designed to separate log templates from their parameters, commonly referred to as a "log parser". This paper aims to introduce a new fast and accurate log parser, named "Tipping". Tipping combines rule-based tokenizers, interdependency token graphs, strongly connected components, and various techniques to ensure rapid, scalable, and precise log parsing. Furthermore, Tipping is parallelized and capable of running on multiple processing cores with close to linear efficiency. We evaluated Tipping against other state-of-the-art log parsers in terms of accuracy, performance, and the downstream task of anomaly detection. Accordingly, we found that Tipping outperformed existing methods in accuracy and performance in our evaluations. More in-depth, Tipping can parse 11 million lines of logs in less than 20 seconds on a laptop machine. Furthermore, we re-implemented a parallelized version of the past IpLom algorithm to demonstrate the effect of parallel processing, and it became the second-fastest parser. As logs keep growing in volume and complexity, the software engineering community needs to ensure automated log analysis tools keep up with the demand, being capable of efficiently handling massive volumes of logs with high accuracy. Tipping's robustness, versatility, efficiency, and scalability make it a viable tool for the modern automated log analysis task.
Related papers
- LogParser-LLM: Advancing Efficient Log Parsing with Large Language Models [19.657278472819588]
We introduce Log-LLM, a novel log integrated with LLM capabilities.
We address the intricate challenge of parsing granularity, proposing a new metric to allow users to calibrate granularity to their specific needs.
Our method's efficacy is empirically demonstrated through evaluations on the Loghub-2k and the large-scale LogPub benchmark.
arXiv Detail & Related papers (2024-08-25T05:34:24Z) - HELP: Hierarchical Embeddings-based Log Parsing [0.25112747242081457]
Logs are a first-hand source of information for software maintenance and failure diagnosis.
Log parsing is a prerequisite for automated log analysis tasks such as anomaly detection, troubleshooting, and root cause analysis.
Existing online parsing algorithms are susceptible to log drift, where slight log changes create false positives that drown out real anomalies.
arXiv Detail & Related papers (2024-08-15T17:54:31Z) - LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection [73.69399219776315]
We propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains.
Specifically, our model is first pre-trained on the source domain to obtain shared semantic knowledge of log data.
Then, we transfer such knowledge to the target domain via shared parameters.
arXiv Detail & Related papers (2024-01-09T12:55:21Z) - A Large-Scale Evaluation for Log Parsing Techniques: How Far Are We? [42.56249610409624]
We provide a new collection of annotated log datasets, denoted Loghub-2.0, which can better reflect the characteristics of log data in real-world software systems.
We conduct a thorough re-evaluation of 15 state-of-the-art logs in a more rigorous and practical setting. Particularly, we introduce a new evaluation metric to mitigate the sensitivity of existing metrics to imbalanced data distributions.
arXiv Detail & Related papers (2023-08-21T16:24:15Z) - Log Parsing Evaluation in the Era of Modern Software Systems [47.370291246632114]
We focus on one integral part of automated log analysis, log parsing, which is the prerequisite to deriving any insights from logs.
Our investigation reveals problematic aspects within the log parsing field, particularly its inefficiency in handling heterogeneous real-world logs.
We propose a tool, Logchimera, that enables estimating log parsing performance in industry contexts.
arXiv Detail & Related papers (2023-08-17T14:19:22Z) - Hexatagging: Projective Dependency Parsing as Tagging [63.5392760743851]
We introduce a novel dependency, the hexatagger, that constructs dependency trees by tagging the words in a sentence with elements from a finite set of possible tags.
Our approach is fully parallelizable at training time, i.e., the structure-building actions needed to build a dependency parse can be predicted in parallel to each other.
We achieve state-of-the-art performance of 96.4 LAS and 97.4 UAS on the Penn Treebank test set.
arXiv Detail & Related papers (2023-06-08T18:02:07Z) - LogLAB: Attention-Based Labeling of Log Data Anomalies via Weak
Supervision [63.08516384181491]
We present LogLAB, a novel modeling approach for automated labeling of log messages without requiring manual work by experts.
Our method relies on estimated failure time windows provided by monitoring systems to produce precise labeled datasets in retrospect.
Our evaluation shows that LogLAB consistently outperforms nine benchmark approaches across three different datasets and maintains an F1-score of more than 0.98 even at large failure time windows.
arXiv Detail & Related papers (2021-11-02T15:16:08Z) - On Automatic Parsing of Log Records [0.0]
We create a tool that generates synthetic Apache log records which we used to train recurrent-neural-network-based MT models.
Models' evaluation on real-world logs shows that the models can learn Apache log format and parse individual log records.
arXiv Detail & Related papers (2021-02-12T00:27:41Z) - Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.