Efficiency of Unsupervised Anomaly Detection Methods on Software Logs
- URL: http://arxiv.org/abs/2312.01934v1
- Date: Mon, 4 Dec 2023 14:44:31 GMT
- Title: Efficiency of Unsupervised Anomaly Detection Methods on Software Logs
- Authors: Jesse Nyyss\"ol\"a, Mika M\"antyl\"a
- Abstract summary: This paper studies unsupervised and time efficient methods for anomaly detection.
The models are evaluated on four public datasets.
For speed, the OOV detector with word representation is optimal. For accuracy, the OOV detector combined with trigram representation yields the highest AUC-ROC (0.846)
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Software log analysis can be laborious and time consuming. Time and labeled
data are usually lacking in industrial settings. This paper studies
unsupervised and time efficient methods for anomaly detection. We study two
custom and two established models. The custom models are: an OOV
(Out-Of-Vocabulary) detector, which counts the terms in the test data that are
not present in the training data, and the Rarity Model (RM), which calculates a
rarity score for terms based on their infrequency. The established models are
KMeans and Isolation Forest. The models are evaluated on four public datasets
(BGL, Thunderbird, Hadoop, HDFS) with three different representation techniques
for the log messages (Words, character Trigrams, Parsed events). We used the
AUC-ROC metric for the evaluation. The results reveal discrepancies based on
the dataset and representation technique. Different configurations are advised
based on specific requirements. For speed, the OOV detector with word
representation is optimal. For accuracy, the OOV detector combined with trigram
representation yields the highest AUC-ROC (0.846). When dealing with unfiltered
data where training includes both normal and anomalous instances, the most
effective combination is the Isolation Forest with event representation,
achieving an AUC-ROC of 0.829.
Related papers
- ARC: A Generalist Graph Anomaly Detector with In-Context Learning [62.202323209244]
ARC is a generalist GAD approach that enables a one-for-all'' GAD model to detect anomalies across various graph datasets on-the-fly.
equipped with in-context learning, ARC can directly extract dataset-specific patterns from the target dataset.
Extensive experiments on multiple benchmark datasets from various domains demonstrate the superior anomaly detection performance, efficiency, and generalizability of ARC.
arXiv Detail & Related papers (2024-05-27T02:42:33Z) - Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets.
We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z) - LARA: A Light and Anti-overfitting Retraining Approach for Unsupervised
Time Series Anomaly Detection [49.52429991848581]
We propose a Light and Anti-overfitting Retraining Approach (LARA) for deep variational auto-encoder based time series anomaly detection methods (VAEs)
This work aims to make three novel contributions: 1) the retraining process is formulated as a convex problem and can converge at a fast rate as well as prevent overfitting; 2) designing a ruminate block, which leverages the historical data without the need to store them; and 3) mathematically proving that when fine-tuning the latent vector and reconstructed data, the linear formations can achieve the least adjusting errors between the ground truths and the fine-tuned ones.
arXiv Detail & Related papers (2023-10-09T12:36:16Z) - AnoRand: A Semi Supervised Deep Learning Anomaly Detection Method by
Random Labeling [0.0]
Anomaly detection or more generally outliers detection is one of the most popular and challenging subject in theoretical and applied machine learning.
We present a new semi-supervised anomaly detection method called textbfAnoRand by combining a deep learning architecture with random synthetic label generation.
arXiv Detail & Related papers (2023-05-28T10:53:34Z) - Imbalanced Aircraft Data Anomaly Detection [103.01418862972564]
Anomaly detection in temporal data from sensors under aviation scenarios is a practical but challenging task.
We propose a Graphical Temporal Data Analysis framework.
It consists three modules, named Series-to-Image (S2I), Cluster-based Resampling Approach using Euclidean Distance (CRD) and Variance-Based Loss (VBL)
arXiv Detail & Related papers (2023-05-17T09:37:07Z) - Unsupervised Model Selection for Time-series Anomaly Detection [7.8027110514393785]
We identify three classes of surrogate (unsupervised) metrics, namely, prediction error, model centrality, and performance on injected synthetic anomalies.
We formulate metric combination with multiple imperfect surrogate metrics as a robust rank aggregation problem.
Large-scale experiments on multiple real-world datasets demonstrate that our proposed unsupervised approach is as effective as selecting the most accurate model.
arXiv Detail & Related papers (2022-10-03T16:49:30Z) - Few-Shot Non-Parametric Learning with Deep Latent Variable Model [50.746273235463754]
We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV)
NPC-LV is a learning framework for any dataset with abundant unlabeled data but very few labeled ones.
We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime.
arXiv Detail & Related papers (2022-06-23T09:35:03Z) - Latent Outlier Exposure for Anomaly Detection with Contaminated Data [31.446666264334528]
Anomaly detection aims at identifying data points that show systematic deviations from the majority of data in an unlabeled dataset.
We propose a strategy for training an anomaly detector in the presence of unlabeled anomalies that is compatible with a broad class of models.
arXiv Detail & Related papers (2022-02-16T14:21:28Z) - Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D
Object Detection [85.11649974840758]
3D object detection networks tend to be biased towards the data they are trained on.
We propose a single-frame approach for source-free, unsupervised domain adaptation of lidar-based 3D object detectors.
arXiv Detail & Related papers (2021-11-30T18:42:42Z) - CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via
Conditional Normalizing Flows [0.0]
We propose a real-time model for anomaly detection with localization.
CFLOW-AD consists of a discriminatively pretrained encoder followed by a multi-scale generative decoders.
Our experiments on the MVTec dataset show that CFLOW-AD outperforms previous methods by 0.36% AUROC in detection task, by 1.12% AUROC and 2.5% AUPRO in localization task, respectively.
arXiv Detail & Related papers (2021-07-27T03:10:38Z) - COPOD: Copula-Based Outlier Detection [7.963284082401154]
Outlier detection refers to the identification of rare items that are deviant from the general data distribution.
Existing approaches suffer from high computational complexity, low predictive capability, and limited interpretability.
We present a novel outlier detection algorithm called COPOD.
arXiv Detail & Related papers (2020-09-20T16:06:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.