EvLog: Identifying Anomalous Logs over Software Evolution
- URL: http://arxiv.org/abs/2306.01509v2
- Date: Tue, 15 Aug 2023 05:43:30 GMT
- Title: EvLog: Identifying Anomalous Logs over Software Evolution
- Authors: Yintong Huo, Cheryl Lee, Yuxin Su, Shiwen Shan, Jinyang Liu and
Michael R. Lyu
- Abstract summary: We propose a novel unsupervised approach named Evolving Log extractor (EvLog) to process logs without parsing.
EvLog implements an anomaly discriminator with an attention mechanism to identify the anomalous logs and avoid the issue brought by the unstable sequence.
EvLog has shown effectiveness in two real-world system evolution log datasets with an average F1 score of 0.955 and 0.847 in the intra-version setting and inter-version setting, respectively.
- Score: 31.46106509190191
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Software logs record system activities, aiding maintainers in identifying the
underlying causes for failures and enabling prompt mitigation actions. However,
maintainers need to inspect a large volume of daily logs to identify the
anomalous logs that reveal failure details for further diagnosis. Thus, how to
automatically distinguish these anomalous logs from normal logs becomes a
critical problem. Existing approaches alleviate the burden on software
maintainers, but they are built upon an improper yet critical assumption:
logging statements in the software remain unchanged. While software keeps
evolving, our empirical study finds that evolving software brings three
challenges: log parsing errors, evolving log events, and unstable log
sequences.
In this paper, we propose a novel unsupervised approach named Evolving Log
analyzer (EvLog) to mitigate these challenges. We first build a multi-level
representation extractor to process logs without parsing to prevent errors from
the parser. The multi-level representations preserve the essential semantics of
logs while leaving out insignificant changes in evolving events. EvLog then
implements an anomaly discriminator with an attention mechanism to identify the
anomalous logs and avoid the issue brought by the unstable sequence. EvLog has
shown effectiveness in two real-world system evolution log datasets with an
average F1 score of 0.955 and 0.847 in the intra-version setting and
inter-version setting, respectively, which outperforms other state-of-the-art
approaches by a wide margin. To our best knowledge, this is the first study on
localizing anomalous logs over software evolution. We believe our work sheds
new light on the impact of software evolution with the corresponding solutions
for the log analysis community.
Related papers
- Anomaly Detection on Unstable Logs with GPT Models [1.9713190626298576]
This paper reports on an experimental comparison of a fine-tuned LLM and alternative models for anomaly detection on unstable logs.
The pre-training of LLMs on vast datasets may enable a robust understanding of diverse patterns and contextual information.
The difference between GPT-3 and other supervised approaches tends to become more significant as the degree of changes in log sequences increases.
arXiv Detail & Related papers (2024-06-11T17:13:18Z) - LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection [73.69399219776315]
We propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains.
Specifically, our model is first pre-trained on the source domain to obtain shared semantic knowledge of log data.
Then, we transfer such knowledge to the target domain via shared parameters.
arXiv Detail & Related papers (2024-01-09T12:55:21Z) - GLAD: Content-aware Dynamic Graphs For Log Anomaly Detection [49.9884374409624]
GLAD is a Graph-based Log Anomaly Detection framework designed to detect anomalies in system logs.
We introduce GLAD, a Graph-based Log Anomaly Detection framework designed to detect anomalies in system logs.
arXiv Detail & Related papers (2023-09-12T04:21:30Z) - MoniLog: An Automated Log-Based Anomaly Detection System for Cloud
Computing Infrastructures [3.04585143845864]
MoniLog is a distributed approach to detect real-time anomalies within large-scale environments.
It aims to detect sequential and quantitative anomalies within a multi-source log stream.
arXiv Detail & Related papers (2023-04-24T09:21:52Z) - LogGD:Detecting Anomalies from System Logs by Graph Neural Networks [14.813971618949068]
We propose a novel graph-based log anomaly detection method, LogGD, to effectively address the issue.
We exploit the powerful capability of Graph Transformer Neural Network, which combines graph structure and node semantics for log-based anomaly detection.
arXiv Detail & Related papers (2022-09-16T11:51:58Z) - Failure Identification from Unstable Log Data using Deep Learning [0.27998963147546146]
We present CLog as a method for failure identification.
By representing the log data as sequences of subprocesses instead of sequences of log events, the effect of the unstable log data is reduced.
Our experimental results demonstrate that the learned subprocesses representations reduce the instability in the input.
arXiv Detail & Related papers (2022-04-06T07:41:48Z) - LogLAB: Attention-Based Labeling of Log Data Anomalies via Weak
Supervision [63.08516384181491]
We present LogLAB, a novel modeling approach for automated labeling of log messages without requiring manual work by experts.
Our method relies on estimated failure time windows provided by monitoring systems to produce precise labeled datasets in retrospect.
Our evaluation shows that LogLAB consistently outperforms nine benchmark approaches across three different datasets and maintains an F1-score of more than 0.98 even at large failure time windows.
arXiv Detail & Related papers (2021-11-02T15:16:08Z) - Robust and Transferable Anomaly Detection in Log Data using Pre-Trained
Language Models [59.04636530383049]
Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users.
We propose a framework for anomaly detection in log data, as a major troubleshooting source of system information.
arXiv Detail & Related papers (2021-02-23T09:17:05Z) - Self-Attentive Classification-Based Anomaly Detection in Unstructured
Logs [59.04636530383049]
We propose Logsy, a classification-based method to learn log representations.
We show an average improvement of 0.25 in the F1 score, compared to the previous methods.
arXiv Detail & Related papers (2020-08-21T07:26:55Z) - Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.