An Efficient Outlier Detection Algorithm for Data Streaming
- URL: http://arxiv.org/abs/2501.01061v1
- Date: Thu, 02 Jan 2025 05:12:43 GMT
- Title: An Efficient Outlier Detection Algorithm for Data Streaming
- Authors: Rui Hu, Luc, Chen, Yiwei Wang,
- Abstract summary: Traditional outlier detection methods, such as the Local Outlier Factor (LOF) algorithm, struggle with real-time data.
We propose a novel approach to enhance the efficiency of LOF algorithms for online anomaly detection, named the Efficient Incremental LOF (EILOF) algorithm.
The EILOF algorithm not only significantly reduces computational costs, but also systematically improves detection accuracy when the number of additional points increases.
- Score: 51.56874851156008
- License:
- Abstract: The nature of modern data is increasingly real-time, making outlier detection crucial in any data-related field, such as finance for fraud detection and healthcare for monitoring patient vitals. Traditional outlier detection methods, such as the Local Outlier Factor (LOF) algorithm, struggle with real-time data due to the need for extensive recalculations with each new data point, limiting their application in real-time environments. While the Incremental LOF (ILOF) algorithm has been developed to tackle the challenges of online anomaly detection, it remains computationally expensive when processing large streams of data points, and its detection performance may degrade after a certain threshold of points have streamed in. In this paper, we propose a novel approach to enhance the efficiency of LOF algorithms for online anomaly detection, named the Efficient Incremental LOF (EILOF) algorithm. The EILOF algorithm only computes the LOF scores of new points without altering the LOF scores of existing data points. Although exact LOF scores have not yet been computed for the existing points in the new algorithm, datasets often contain noise, and minor deviations in LOF score calculations do not necessarily degrade detection performance. In fact, such deviations can sometimes enhance outlier detection. We systematically tested this approach on both simulated and real-world datasets, demonstrating that EILOF outperforms ILOF as the volume of streaming data increases across various scenarios. The EILOF algorithm not only significantly reduces computational costs, but also systematically improves detection accuracy when the number of additional points increases compared to the ILOF algorithm.
Related papers
- Score-matching-based Structure Learning for Temporal Data on Networks [17.166362605356074]
Causal discovery is a crucial initial step in establishing causality from empirical data and background knowledge.
Current score-matching-based algorithms are primarily designed to analyze independent and identically distributed (i.i.d.) data.
We have developed a new parent-finding subroutine for leaf nodes in DAGs, significantly accelerating the most time-consuming part of the process: the pruning step.
arXiv Detail & Related papers (2024-12-10T12:36:35Z) - LAVA: Data Valuation without Pre-Specified Learning Algorithms [20.578106028270607]
We introduce a new framework that can value training data in a way that is oblivious to the downstream learning algorithm.
We develop a proxy for the validation performance associated with a training set based on a non-conventional class-wise Wasserstein distance between training and validation sets.
We show that the distance characterizes the upper bound of the validation performance for any given model under certain Lipschitz conditions.
arXiv Detail & Related papers (2023-04-28T19:05:16Z) - Quantum Algorithm for Unsupervised Anomaly Detection [5.4335077019052145]
Anomaly detection plays a critical role in fraud detection, health care intrusion detection, military surveillance, etc.
The Local Outlier Factor algorithm (LOF algorithm) has been extensively studied.
Here we present a quantum LOF algorithm consisting of three parts corresponding to the classical algorithm.
arXiv Detail & Related papers (2023-04-18T03:20:11Z) - A Log-Linear Non-Parametric Online Changepoint Detection Algorithm based
on Functional Pruning [5.202524136984542]
We build a flexible nonparametric approach to detect a change in the distribution of a sequence.
Thanks to functional pruning ideas, NP-FOCuS has a computational cost that is log-linear in the number of observations.
In terms of detection power, NP-FOCuS is seen to outperform current nonparametric online changepoint techniques in a variety of settings.
arXiv Detail & Related papers (2023-02-06T11:50:02Z) - PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows.
Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z) - Large-scale Optimization of Partial AUC in a Range of False Positive
Rates [51.12047280149546]
The area under the ROC curve (AUC) is one of the most widely used performance measures for classification models in machine learning.
We develop an efficient approximated gradient descent method based on recent practical envelope smoothing technique.
Our proposed algorithm can also be used to minimize the sum of some ranked range loss, which also lacks efficient solvers.
arXiv Detail & Related papers (2022-03-03T03:46:18Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z) - IPOF: An Extremely and Excitingly Simple Outlier Detection Booster via
Infinite Propagation [30.91911545889579]
Outlier detection is one of the most popular and continuously rising topics in the data mining field.
In this paper, we consider the score-based outlier detection category and point out that the performance of current outlier detection algorithms might be further boosted by score propagation.
Specifically, we propose Infinite propagation of Outlier Factor (iPOF) algorithm, an extremely and excitingly simple outlier detection booster via infinite propagation.
arXiv Detail & Related papers (2021-08-01T03:48:09Z) - Bayesian Optimization with Machine Learning Algorithms Towards Anomaly
Detection [66.05992706105224]
In this paper, an effective anomaly detection framework is proposed utilizing Bayesian Optimization technique.
The performance of the considered algorithms is evaluated using the ISCX 2012 dataset.
Experimental results show the effectiveness of the proposed framework in term of accuracy rate, precision, low-false alarm rate, and recall.
arXiv Detail & Related papers (2020-08-05T19:29:35Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.