On Automatic Parsing of Log Records
- URL: http://arxiv.org/abs/2102.06320v1
- Date: Fri, 12 Feb 2021 00:27:41 GMT
- Title: On Automatic Parsing of Log Records
- Authors: Jared Rand and Andriy Miranskyy
- Abstract summary: We create a tool that generates synthetic Apache log records which we used to train recurrent-neural-network-based MT models.
Models' evaluation on real-world logs shows that the models can learn Apache log format and parse individual log records.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Software log analysis helps to maintain the health of software solutions and
ensure compliance and security. Existing software systems consist of
heterogeneous components emitting logs in various formats. A typical solution
is to unify the logs using manually built parsers, which is laborious.
Instead, we explore the possibility of automating the parsing task by
employing machine translation (MT). We create a tool that generates synthetic
Apache log records which we used to train recurrent-neural-network-based MT
models. Models' evaluation on real-world logs shows that the models can learn
Apache log format and parse individual log records. The median relative edit
distance between an actual real-world log record and the MT prediction is less
than or equal to 28%. Thus, we show that log parsing using an MT approach is
promising.
Related papers
- HELP: Hierarchical Embeddings-based Log Parsing [0.25112747242081457]
Logs are a first-hand source of information for software maintenance and failure diagnosis.
Log parsing is a prerequisite for automated log analysis tasks such as anomaly detection, troubleshooting, and root cause analysis.
Existing online parsing algorithms are susceptible to log drift, where slight log changes create false positives that drown out real anomalies.
arXiv Detail & Related papers (2024-08-15T17:54:31Z) - Stronger, Cheaper and Demonstration-Free Log Parsing with LLMs [18.240096266464544]
We propose LogBatcher, a cost-effective LLM-based log that requires no training process or labeled data.
We have conducted experiments on 16 public log datasets and the results show that LogBatcher is effective for log parsing.
arXiv Detail & Related papers (2024-06-10T10:39:28Z) - LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection [73.69399219776315]
We propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains.
Specifically, our model is first pre-trained on the source domain to obtain shared semantic knowledge of log data.
Then, we transfer such knowledge to the target domain via shared parameters.
arXiv Detail & Related papers (2024-01-09T12:55:21Z) - A Large-Scale Evaluation for Log Parsing Techniques: How Far Are We? [42.56249610409624]
We provide a new collection of annotated log datasets, denoted Loghub-2.0, which can better reflect the characteristics of log data in real-world software systems.
We conduct a thorough re-evaluation of 15 state-of-the-art logs in a more rigorous and practical setting. Particularly, we introduce a new evaluation metric to mitigate the sensitivity of existing metrics to imbalanced data distributions.
arXiv Detail & Related papers (2023-08-21T16:24:15Z) - Log Parsing Evaluation in the Era of Modern Software Systems [47.370291246632114]
We focus on one integral part of automated log analysis, log parsing, which is the prerequisite to deriving any insights from logs.
Our investigation reveals problematic aspects within the log parsing field, particularly its inefficiency in handling heterogeneous real-world logs.
We propose a tool, Logchimera, that enables estimating log parsing performance in industry contexts.
arXiv Detail & Related papers (2023-08-17T14:19:22Z) - Hue: A User-Adaptive Parser for Hybrid Logs [4.783349633632601]
Hue converts each log message to a sequence of special wildcards using a key casting table.
Hue can effectively utilize user feedback via a novel merge-reject strategy.
Hue has been successfully deployed in a real production environment for daily hybrid log parsing.
arXiv Detail & Related papers (2023-08-14T11:28:50Z) - Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval [129.25914272977542]
RetoMaton is a weighted finite automaton built on top of the datastore.
Traversing this automaton at inference time, in parallel to the LM inference, reduces its perplexity.
arXiv Detail & Related papers (2022-01-28T21:38:56Z) - LogLAB: Attention-Based Labeling of Log Data Anomalies via Weak
Supervision [63.08516384181491]
We present LogLAB, a novel modeling approach for automated labeling of log messages without requiring manual work by experts.
Our method relies on estimated failure time windows provided by monitoring systems to produce precise labeled datasets in retrospect.
Our evaluation shows that LogLAB consistently outperforms nine benchmark approaches across three different datasets and maintains an F1-score of more than 0.98 even at large failure time windows.
arXiv Detail & Related papers (2021-11-02T15:16:08Z) - Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.