An Efficient Anomaly Detection Approach using Cube Sampling with
Streaming Data
- URL: http://arxiv.org/abs/2110.01813v1
- Date: Tue, 5 Oct 2021 04:23:00 GMT
- Title: An Efficient Anomaly Detection Approach using Cube Sampling with
Streaming Data
- Authors: Seemandhar Jain, Prarthi Jain, Abhishek Srivastava
- Abstract summary: Anomaly detection is critical in various fields, including intrusion detection, health monitoring, fault diagnosis, and sensor network event detection.
The isolation forest (or iForest) approach is a well-known technique for detecting anomalies.
We propose an efficient iForest based approach for anomaly detection using cube sampling that is effective on streaming data.
- Score: 2.0515785954568626
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Anomaly detection is critical in various fields, including intrusion
detection, health monitoring, fault diagnosis, and sensor network event
detection. The isolation forest (or iForest) approach is a well-known technique
for detecting anomalies. It is, however, ineffective when dealing with dynamic
streaming data, which is becoming increasingly prevalent in a wide variety of
application areas these days. In this work, we extend our previous work by
proposed an efficient iForest based approach for anomaly detection using cube
sampling that is effective on streaming data. Cube sampling is used in the
initial stage to choose nearly balanced samples, significantly reducing storage
requirements while preserving efficiency. Following that, the streaming nature
of data is addressed by a sliding window technique that generates consecutive
chunks of data for systematic processing. The novelty of this paper is in
applying Cube sampling in iForest and calculating inclusion probability. The
proposed approach is equally successful at detecting anomalies as existing
state-of-the-art approaches, requiring significantly less storage and time
complexity. We undertake empirical evaluations of the proposed approach using
standard datasets and demonstrate that it outperforms traditional approaches in
terms of Area Under the ROC Curve (AUC-ROC) and can handle high-dimensional
streaming data.
Related papers
- Research on Dynamic Data Flow Anomaly Detection based on Machine Learning [11.526496773281938]
In this study, the unsupervised learning method is employed to identify anomalies in dynamic data flows.
By clustering similar data, the model is able to detect data behaviour that deviates significantly from normal traffic without the need for labelled data.
Notably, it demonstrates robust and adaptable performance, particularly in the context of unbalanced data.
arXiv Detail & Related papers (2024-09-23T08:19:15Z) - TraceMesh: Scalable and Streaming Sampling for Distributed Traces [51.08892669409318]
TraceMesh is a scalable and streaming sampler for distributed traces.
It accommodates previously unseen trace features in a unified and streamlined way.
TraceMesh outperforms state-of-the-art methods by a significant margin in both sampling accuracy and efficiency.
arXiv Detail & Related papers (2024-06-11T06:13:58Z) - DeepHYDRA: Resource-Efficient Time-Series Anomaly Detection in Dynamically-Configured Systems [3.44012349879073]
We present DeepHYDRA (Deep Hybrid DBSCAN/Reduction-Based Anomaly Detection)
It combines DBSCAN and learning-based anomaly detection.
It is shown to reliably detect different types of anomalies in both large and complex datasets.
arXiv Detail & Related papers (2024-05-13T13:47:15Z) - Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution [62.71425232332837]
We show that training amortized models with noisy labels is inexpensive and surprisingly effective.
This approach significantly accelerates several feature attribution and data valuation methods, often yielding an order of magnitude speedup over existing approaches.
arXiv Detail & Related papers (2024-01-29T03:42:37Z) - Diffusion Generative Flow Samplers: Improving learning signals through
partial trajectory optimization [87.21285093582446]
Diffusion Generative Flow Samplers (DGFS) is a sampling-based framework where the learning process can be tractably broken down into short partial trajectory segments.
Our method takes inspiration from the theory developed for generative flow networks (GFlowNets)
arXiv Detail & Related papers (2023-10-04T09:39:05Z) - Distributed Dynamic Safe Screening Algorithms for Sparse Regularization [73.85961005970222]
We propose a new distributed dynamic safe screening (DDSS) method for sparsity regularized models and apply it on shared-memory and distributed-memory architecture respectively.
We prove that the proposed method achieves the linear convergence rate with lower overall complexity and can eliminate almost all the inactive features in a finite number of iterations almost surely.
arXiv Detail & Related papers (2022-04-23T02:45:55Z) - GAN Based Boundary Aware Classifier for Detecting Out-of-distribution
Samples [24.572516991009323]
We propose a GAN based boundary aware classifier (GBAC) for generating a closed hyperspace which only contains most id data.
Our method is based on the fact that the traditional neural net seperates the feature space as several unclosed regions which are not suitable for ood detection.
With GBAC as an auxiliary module, the ood data distributed outside the closed hyperspace will be assigned with much lower score, allowing more effective ood detection.
arXiv Detail & Related papers (2021-12-22T03:35:54Z) - Fast Wireless Sensor Anomaly Detection based on Data Stream in Edge
Computing Enabled Smart Greenhouse [5.716360276016705]
Edge computing enabled smart greenhouse is a representative application of Internet of Things technology.
Traditional anomaly detection algorithms have not properly considered the inherent characteristics of data stream produced by wireless sensor.
arXiv Detail & Related papers (2021-07-28T13:32:12Z) - DAAIN: Detection of Anomalous and Adversarial Input using Normalizing
Flows [52.31831255787147]
We introduce a novel technique, DAAIN, to detect out-of-distribution (OOD) inputs and adversarial attacks (AA)
Our approach monitors the inner workings of a neural network and learns a density estimator of the activation distribution.
Our model can be trained on a single GPU making it compute efficient and deployable without requiring specialized accelerators.
arXiv Detail & Related papers (2021-05-30T22:07:13Z) - TadGAN: Time Series Anomaly Detection Using Generative Adversarial
Networks [73.01104041298031]
TadGAN is an unsupervised anomaly detection approach built on Generative Adversarial Networks (GANs)
To capture the temporal correlations of time series, we use LSTM Recurrent Neural Networks as base models for Generators and Critics.
To demonstrate the performance and generalizability of our approach, we test several anomaly scoring techniques and report the best-suited one.
arXiv Detail & Related papers (2020-09-16T15:52:04Z) - Anomaly Detection in Trajectory Data with Normalizing Flows [0.0]
We propose an approach based on normalizing flows that enables complex density estimation from data with neural networks.
Our proposal computes exact model likelihood values, an important feature of normalizing flows, for each segment of the trajectory.
We evaluate our methodology, named aggregated anomaly detection with normalizing flows (GRADINGS), using real world trajectory data and compare it with more traditional anomaly detection techniques.
arXiv Detail & Related papers (2020-04-13T14:16:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.