Self-Supervised Log Parsing
- URL: http://arxiv.org/abs/2003.07905v1
- Date: Tue, 17 Mar 2020 19:25:25 GMT
- Title: Self-Supervised Log Parsing
- Authors: Sasho Nedelkoski, Jasmin Bogatinovski, Alexander Acker, Jorge Cardoso
and Odej Kao
- Abstract summary: Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
- Score: 59.04636530383049
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Logs are extensively used during the development and maintenance of software
systems. They collect runtime events and allow tracking of code execution,
which enables a variety of critical tasks such as troubleshooting and fault
detection. However, large-scale software systems generate massive volumes of
semi-structured log records, posing a major challenge for automated analysis.
Parsing semi-structured records with free-form text log messages into
structured templates is the first and crucial step that enables further
analysis. Existing approaches rely on log-specific heuristics or manual rule
extraction. These are often specialized in parsing certain log types, and thus,
limit performance scores and generalization. We propose a novel parsing
technique called NuLog that utilizes a self-supervised learning model and
formulates the parsing task as masked language modeling (MLM). In the process
of parsing, the model extracts summarizations from the logs in the form of a
vector embedding. This allows the coupling of the MLM as pre-training with a
downstream anomaly detection task. We evaluate the parsing performance of NuLog
on 10 real-world log datasets and compare the results with 12 parsing
techniques. The results show that NuLog outperforms existing methods in parsing
accuracy with an average of 99% and achieves the lowest edit distance to the
ground truth templates. Additionally, two case studies are conducted to
demonstrate the ability of the approach for log-based anomaly detection in both
supervised and unsupervised scenario. The results show that NuLog can be
successfully used to support troubleshooting tasks. The implementation is
available at https://github.com/nulog/nulog.
Related papers
- HELP: Hierarchical Embeddings-based Log Parsing [0.25112747242081457]
Logs are a first-hand source of information for software maintenance and failure diagnosis.
Log parsing is a prerequisite for automated log analysis tasks such as anomaly detection, troubleshooting, and root cause analysis.
Existing online parsing algorithms are susceptible to log drift, where slight log changes create false positives that drown out real anomalies.
arXiv Detail & Related papers (2024-08-15T17:54:31Z) - Stronger, Cheaper and Demonstration-Free Log Parsing with LLMs [18.240096266464544]
We propose LogBatcher, a cost-effective LLM-based log that requires no training process or labeled data.
We have conducted experiments on 16 public log datasets and the results show that LogBatcher is effective for log parsing.
arXiv Detail & Related papers (2024-06-10T10:39:28Z) - Lemur: Log Parsing with Entropy Sampling and Chain-of-Thought Merging [33.522495018321386]
We introduce a cutting-edge textbfLog parsing framework with textbfEntropy sampling and Chain-of-Thought textbfMerging (Lemur)
We propose a novel sampling method inspired by information entropy, which efficiently clusters typical logs.
Lemur achieves the state-of-the-art performance and impressive efficiency.
arXiv Detail & Related papers (2024-02-28T09:51:55Z) - LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection [73.69399219776315]
We propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains.
Specifically, our model is first pre-trained on the source domain to obtain shared semantic knowledge of log data.
Then, we transfer such knowledge to the target domain via shared parameters.
arXiv Detail & Related papers (2024-01-09T12:55:21Z) - Prompting for Automatic Log Template Extraction [6.299547112893045]
DivLog is an effective log parsing framework based on the incontext learning (ICL) ability of large language models (LLMs)
By mining the semantics of examples in the prompt, DivLog generates a target log template in a training-free manner.
arXiv Detail & Related papers (2023-07-19T12:44:59Z) - Single-Stage Visual Relationship Learning using Conditional Queries [60.90880759475021]
TraCQ is a new formulation for scene graph generation that avoids the multi-task learning problem and the entity pair distribution.
We employ a DETR-based encoder-decoder conditional queries to significantly reduce the entity label space as well.
Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset.
arXiv Detail & Related papers (2023-06-09T06:02:01Z) - MURMUR: Modular Multi-Step Reasoning for Semi-Structured Data-to-Text
Generation [102.20036684996248]
We propose MURMUR, a neuro-symbolic modular approach to text generation from semi-structured data with multi-step reasoning.
We conduct experiments on two data-to-text generation tasks like WebNLG and LogicNLG.
arXiv Detail & Related papers (2022-12-16T17:36:23Z) - LogLAB: Attention-Based Labeling of Log Data Anomalies via Weak
Supervision [63.08516384181491]
We present LogLAB, a novel modeling approach for automated labeling of log messages without requiring manual work by experts.
Our method relies on estimated failure time windows provided by monitoring systems to produce precise labeled datasets in retrospect.
Our evaluation shows that LogLAB consistently outperforms nine benchmark approaches across three different datasets and maintains an F1-score of more than 0.98 even at large failure time windows.
arXiv Detail & Related papers (2021-11-02T15:16:08Z) - Robust and Transferable Anomaly Detection in Log Data using Pre-Trained
Language Models [59.04636530383049]
Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users.
We propose a framework for anomaly detection in log data, as a major troubleshooting source of system information.
arXiv Detail & Related papers (2021-02-23T09:17:05Z) - On Automatic Parsing of Log Records [0.0]
We create a tool that generates synthetic Apache log records which we used to train recurrent-neural-network-based MT models.
Models' evaluation on real-world logs shows that the models can learn Apache log format and parse individual log records.
arXiv Detail & Related papers (2021-02-12T00:27:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.