Related papers: Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A Benchmarking Study

Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A Benchmarking Study

URL: http://arxiv.org/abs/2402.07281v2
Date: Mon, 26 Feb 2024 04:56:07 GMT
Title: Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A Benchmarking Study
Authors: Santonu Sarkar, Shanay Mehta, Nicole Fernandes, Jyotirmoy Sarkar and Snehanshu Saha
Abstract summary: This paper evaluates a diverse array of machine learning-based anomaly detection algorithms. The paper contributes significantly by conducting an unbiased comparison of various anomaly detection algorithms.
Score: 0.6291443816903801
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Detection of anomalous situations for complex mission-critical systems holds paramount importance when their service continuity needs to be ensured. A major challenge in detecting anomalies from the operational data arises due to the imbalanced class distribution problem since the anomalies are supposed to be rare events. This paper evaluates a diverse array of machine learning-based anomaly detection algorithms through a comprehensive benchmark study. The paper contributes significantly by conducting an unbiased comparison of various anomaly detection algorithms, spanning classical machine learning including various tree-based approaches to deep learning and outlier detection methods. The inclusion of 104 publicly available and a few proprietary industrial systems datasets enhances the diversity of the study, allowing for a more realistic evaluation of algorithm performance and emphasizing the importance of adaptability to real-world scenarios. The paper dispels the deep learning myth, demonstrating that though powerful, deep learning is not a universal solution in this case. We observed that recently proposed tree-based evolutionary algorithms outperform in many scenarios. We noticed that tree-based approaches catch a singleton anomaly in a dataset where deep learning methods fail. On the other hand, classical SVM performs the best on datasets with more than 10% anomalies, implying that such scenarios can be best modeled as a classification problem rather than anomaly detection. To our knowledge, such a study on a large number of state-of-the-art algorithms using diverse data sets, with the objective of guiding researchers and practitioners in making informed algorithmic choices, has not been attempted earlier.

Related papers

A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods. The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics. We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z)
Unraveling the "Anomaly" in Time Series Anomaly Detection: A Self-supervised Tri-domain Solution [89.16750999704969]
Anomaly labels hinder traditional supervised models in time series anomaly detection. Various SOTA deep learning techniques, such as self-supervised learning, have been introduced to tackle this issue. We propose a novel self-supervised learning based Tri-domain Anomaly Detector (TriAD)
arXiv Detail & Related papers (2023-11-19T05:37:18Z)
Active anomaly detection based on deep one-class classification [9.904380236739398]
We tackle two essential problems of active learning for Deep SVDD: query strategy and semi-supervised learning method. First, rather than solely identifying anomalies, our query strategy selects uncertain samples according to an adaptive boundary. Second, we apply noise contrastive estimation in training a one-class classification model to incorporate both labeled normal and abnormal data effectively.
arXiv Detail & Related papers (2023-09-18T03:56:45Z)
Multi-Dimensional Ability Diagnosis for Machine Learning Algorithms [88.93372675846123]
We propose a task-agnostic evaluation framework Camilla for evaluating machine learning algorithms. We use cognitive diagnosis assumptions and neural networks to learn the complex interactions among algorithms, samples and the skills of each sample. In our experiments, Camilla outperforms state-of-the-art baselines on the metric reliability, rank consistency and rank stability.
arXiv Detail & Related papers (2023-07-14T03:15:56Z)
Learning to Bound Counterfactual Inference in Structural Causal Models from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm. The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources. It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z)
Deep Learning for Time Series Anomaly Detection: A Survey [53.83593870825628]
Time series anomaly detection has applications in a wide range of research fields and applications, including manufacturing and healthcare. The large size and complex patterns of time series have led researchers to develop specialised deep learning models for detecting anomalous patterns. This survey focuses on providing structured and comprehensive state-of-the-art time series anomaly detection models through the use of deep learning.
arXiv Detail & Related papers (2022-11-09T22:40:22Z)
Deep Learning for Anomaly Detection in Log Data: A Survey [3.508620069426877]
Self-learning anomaly detection techniques capture patterns in log data and report unexpected log event occurrences. Deep learning neural networks for this purpose have been presented. There exist many different architectures for deep learning and it is non-trivial to encode raw and unstructured log data.
arXiv Detail & Related papers (2022-07-08T10:58:28Z)
Anomaly Detection for High-Dimensional Data Using Large Deviations Principle [0.8526086056172273]
We propose an anomaly detection algorithm that can scale to high-dimensional data using concepts from the theory of large deviations. The proposed Large Deviations Anomaly Detection (LAD) algorithm is shown to outperform state of art anomaly detection methods on a variety of large and high-dimensional benchmark data sets.
arXiv Detail & Related papers (2021-09-28T13:13:14Z)
Meta-learning One-class Classifiers with Eigenvalue Solvers for Supervised Anomaly Detection [55.888835686183995]
We propose a neural network-based meta-learning method for supervised anomaly detection. We experimentally demonstrate that the proposed method achieves better performance than existing anomaly detection and few-shot learning methods.
arXiv Detail & Related papers (2021-03-01T01:43:04Z)
Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View [82.80085730891126]
We provide the first modernally precise analysis of linear multiclass classification. Our analysis reveals that the classification accuracy is highly distribution-dependent. The insights gained may pave the way for a precise understanding of other classification algorithms.
arXiv Detail & Related papers (2020-11-16T05:17:29Z)
Algorithmic Frameworks for the Detection of High Density Anomalies [0.0]
High-density anomalies are deviant cases positioned in the most normal regions of the data space. This study introduces several non-parametric algorithmic frameworks for unsupervised detection.
arXiv Detail & Related papers (2020-10-09T17:48:02Z)
TadGAN: Time Series Anomaly Detection Using Generative Adversarial Networks [73.01104041298031]
TadGAN is an unsupervised anomaly detection approach built on Generative Adversarial Networks (GANs) To capture the temporal correlations of time series, we use LSTM Recurrent Neural Networks as base models for Generators and Critics. To demonstrate the performance and generalizability of our approach, we test several anomaly scoring techniques and report the best-suited one.
arXiv Detail & Related papers (2020-09-16T15:52:04Z)
Anomaly Detection in Univariate Time-series: A Survey on the State-of-the-Art [0.0]
Anomaly detection for time-series data has been an important research field for a long time. Recent years an increasing number of machine learning algorithms have been developed to detect anomalies on time-series. Researchers tried to improve these techniques using (deep) neural networks.
arXiv Detail & Related papers (2020-04-01T13:22:34Z)
Effectiveness of Tree-based Ensembles for Anomaly Discovery: Insights, Batch and Streaming Active Learning [18.49217234413188]
This paper makes four main contributions to improve the state-of-the-art in anomaly discovery using tree-based ensembles. We develop a novel batch active learning algorithm to improve the diversity of discovered anomalies. We present a data drift detection algorithm that not only detects the drift robustly, but also allows us to take corrective actions to adapt the anomaly detector in a principled manner.
arXiv Detail & Related papers (2019-01-23T23:41:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.