A Word is Worth 4-bit: Efficient Log Parsing with Binary Coded Decimal Recognition
- URL: http://arxiv.org/abs/2506.01147v1
- Date: Sun, 01 Jun 2025 20:00:00 GMT
- Title: A Word is Worth 4-bit: Efficient Log Parsing with Binary Coded Decimal Recognition
- Authors: Prerak Srivastava, Giulio Corallo, Sergey Rybalko,
- Abstract summary: We propose a character-level log utilizing a novel neural architecture that aggregates character embeddings.<n>Our approach estimates a sequence of binary-coded decimals to achieve highly granular log templates extraction.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: System-generated logs are typically converted into categorical log templates through parsing. These templates are crucial for generating actionable insights in various downstream tasks. However, existing parsers often fail to capture fine-grained template details, leading to suboptimal accuracy and reduced utility in downstream tasks requiring precise pattern identification. We propose a character-level log parser utilizing a novel neural architecture that aggregates character embeddings. Our approach estimates a sequence of binary-coded decimals to achieve highly granular log templates extraction. Our low-resource character-level parser, tested on revised Loghub-2k and a manually annotated industrial dataset, matches LLM-based parsers in accuracy while outperforming semantic parsers in efficiency.
Related papers
- HELP: Hierarchical Embeddings-based Log Parsing [0.25112747242081457]
Logs are a first-hand source of information for software maintenance and failure diagnosis.
Log parsing is a prerequisite for automated log analysis tasks such as anomaly detection, troubleshooting, and root cause analysis.
Existing online parsing algorithms are susceptible to log drift, where slight log changes create false positives that drown out real anomalies.
arXiv Detail & Related papers (2024-08-15T17:54:31Z) - LibreLog: Accurate and Efficient Unsupervised Log Parsing Using Open-Source Large Language Models [3.7960472831772774]
This paper introduces LibreLog, an unsupervised log parsing approach that enhances privacy and reduces operational costs while achieving state-of-the-art parsing accuracy.
Our evaluation on LogHub-2.0 shows that LibreLog achieves 25% higher parsing accuracy and processes 2.7 times faster compared to state-of-the-art LLMs.
arXiv Detail & Related papers (2024-08-02T21:54:13Z) - Log Parsing using LLMs with Self-Generated In-Context Learning and Self-Correction [15.93927602769091]
Recent emergence of large language models (LLMs) has demonstrated strong abilities in understanding natural language and code.<n>Ada is an effective and adaptive log parsing framework using LLMs with self-generated in-context learning (SG-ICL) and self-correction.<n>Ada outperforms state-of-the-art methods across all metrics, even in zero-shot scenarios.
arXiv Detail & Related papers (2024-06-05T15:31:43Z) - Prompting for Automatic Log Template Extraction [6.299547112893045]
DivLog is an effective log parsing framework based on the incontext learning (ICL) ability of large language models (LLMs)
By mining the semantics of examples in the prompt, DivLog generates a target log template in a training-free manner.
arXiv Detail & Related papers (2023-07-19T12:44:59Z) - BenchCLAMP: A Benchmark for Evaluating Language Models on Syntactic and
Semantic Parsing [55.058258437125524]
We introduce BenchCLAMP, a Benchmark to evaluate Constrained LAnguage Model Parsing.
We benchmark eight language models, including two GPT-3 variants available only through an API.
Our experiments show that encoder-decoder pretrained language models can achieve similar performance or surpass state-of-the-art methods for syntactic and semantic parsing when the model output is constrained to be valid.
arXiv Detail & Related papers (2022-06-21T18:34:11Z) - On The Ingredients of an Effective Zero-shot Semantic Parser [95.01623036661468]
We analyze zero-shot learning by paraphrasing training examples of canonical utterances and programs from a grammar.
We propose bridging these gaps using improved grammars, stronger paraphrasers, and efficient learning methods.
Our model achieves strong performance on two semantic parsing benchmarks (Scholar, Geo) with zero labeled data.
arXiv Detail & Related papers (2021-10-15T21:41:16Z) - Low-Resource Task-Oriented Semantic Parsing via Intrinsic Modeling [65.51280121472146]
We exploit what we intrinsically know about ontology labels to build efficient semantic parsing models.
Our model is highly efficient using a low-resource benchmark derived from TOPv2.
arXiv Detail & Related papers (2021-04-15T04:01:02Z) - Fast semantic parsing with well-typedness guarantees [78.76675218975768]
AM dependency parsing is a principled method for neural semantic parsing with high accuracy across multiple graphbanks.
We describe an A* and a transition-based for AM dependency parsing which guarantee well-typedness and improve parsing speed by up to 3 orders of magnitude.
arXiv Detail & Related papers (2020-09-15T21:54:01Z) - A Simple Global Neural Discourse Parser [61.728994693410954]
We propose a simple chart-based neural discourse that does not require any manually-crafted features and is based on learned span representations only.
We empirically demonstrate that our model achieves the best performance among globals, and comparable performance to state-of-art greedys.
arXiv Detail & Related papers (2020-09-02T19:28:40Z) - Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.