Try with Simpler -- An Evaluation of Improved Principal Component
Analysis in Log-based Anomaly Detection
- URL: http://arxiv.org/abs/2308.12612v2
- Date: Wed, 31 Jan 2024 15:52:01 GMT
- Title: Try with Simpler -- An Evaluation of Improved Principal Component
Analysis in Log-based Anomaly Detection
- Authors: Lin Yang, Junjie Chen, Shutao Gao, Zhihao Gong, Hongyu Zhang, Yue
Kang, Huaan Li
- Abstract summary: Deep learning (DL) has spurred interest in enhancing log-based anomaly detection.
Traditional machine learning and data mining techniques are less data-dependent and more efficient but less effective than DL.
We optimize the unsupervised PCA (Principal Component Analysis), a traditional technique, by incorporating lightweight semantic-based log representation.
- Score: 18.328245109223964
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid growth of deep learning (DL) has spurred interest in enhancing
log-based anomaly detection. This approach aims to extract meaning from log
events (log message templates) and develop advanced DL models for anomaly
detection. However, these DL methods face challenges like heavy reliance on
training data, labels, and computational resources due to model complexity. In
contrast, traditional machine learning and data mining techniques are less
data-dependent and more efficient but less effective than DL. To make log-based
anomaly detection more practical, the goal is to enhance traditional techniques
to match DL's effectiveness. Previous research in a different domain (linking
questions on Stack Overflow) suggests that optimized traditional techniques can
rival state-of-the-art DL methods. Drawing inspiration from this concept, we
conducted an empirical study. We optimized the unsupervised PCA (Principal
Component Analysis), a traditional technique, by incorporating lightweight
semantic-based log representation. This addresses the issue of unseen log
events in training data, enhancing log representation. Our study compared seven
log-based anomaly detection methods, including four DL-based, two traditional,
and the optimized PCA technique, using public and industrial datasets. Results
indicate that the optimized unsupervised PCA technique achieves similar
effectiveness to advanced supervised/semi-supervised DL methods while being
more stable with limited training data and resource-efficient. This
demonstrates the adaptability and strength of traditional techniques through
small yet impactful adaptations.
Related papers
- PREM: A Simple Yet Effective Approach for Node-Level Graph Anomaly
Detection [65.24854366973794]
Node-level graph anomaly detection (GAD) plays a critical role in identifying anomalous nodes from graph-structured data in domains such as medicine, social networks, and e-commerce.
We introduce a simple method termed PREprocessing and Matching (PREM for short) to improve the efficiency of GAD.
Our approach streamlines GAD, reducing time and memory consumption while maintaining powerful anomaly detection capabilities.
arXiv Detail & Related papers (2023-10-18T02:59:57Z) - Stabilizing Subject Transfer in EEG Classification with Divergence
Estimation [17.924276728038304]
We propose several graphical models to describe an EEG classification task.
We identify statistical relationships that should hold true in an idealized training scenario.
We design regularization penalties to enforce these relationships in two stages.
arXiv Detail & Related papers (2023-10-12T23:06:52Z) - Efficient Deep Reinforcement Learning Requires Regulating Overfitting [91.88004732618381]
We show that high temporal-difference (TD) error on the validation set of transitions is the main culprit that severely affects the performance of deep RL algorithms.
We show that a simple online model selection method that targets the validation TD error is effective across state-based DMC and Gym tasks.
arXiv Detail & Related papers (2023-04-20T17:11:05Z) - Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples.
We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment.
We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z) - Hybridization of Capsule and LSTM Networks for unsupervised anomaly
detection on multivariate data [0.0]
This paper introduces a novel NN architecture which hybridises the Long-Short-Term-Memory (LSTM) and Capsule Networks into a single network.
The proposed method uses an unsupervised learning technique to overcome the issues with finding large volumes of labelled training data.
arXiv Detail & Related papers (2022-02-11T10:33:53Z) - Efficient training of lightweight neural networks using Online
Self-Acquired Knowledge Distillation [51.66271681532262]
Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner.
We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
arXiv Detail & Related papers (2021-08-26T14:01:04Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Data-efficient Weakly-supervised Learning for On-line Object Detection
under Domain Shift in Robotics [24.878465999976594]
Several object detection methods have been proposed in the literature, the vast majority based on Deep Convolutional Neural Networks (DCNNs)
These methods have important limitations for robotics: Learning solely on off-line data may introduce biases, and prevents adaptation to novel tasks.
In this work, we investigate how weakly-supervised learning can cope with these problems.
arXiv Detail & Related papers (2020-12-28T16:36:11Z) - Self-Attentive Classification-Based Anomaly Detection in Unstructured
Logs [59.04636530383049]
We propose Logsy, a classification-based method to learn log representations.
We show an average improvement of 0.25 in the F1 score, compared to the previous methods.
arXiv Detail & Related papers (2020-08-21T07:26:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.