Related papers: An Efficient Outlier Detection Algorithm for Data Streaming

An Efficient Outlier Detection Algorithm for Data Streaming

URL: http://arxiv.org/abs/2501.01061v1
Date: Thu, 02 Jan 2025 05:12:43 GMT
Title: An Efficient Outlier Detection Algorithm for Data Streaming
Authors: Rui Hu, Luc, Chen, Yiwei Wang,
Abstract summary: Traditional outlier detection methods, such as the Local Outlier Factor (LOF) algorithm, struggle with real-time data.<n>We propose a novel approach to enhance the efficiency of LOF algorithms for online anomaly detection, named the Efficient Incremental LOF (EILOF) algorithm.<n>The EILOF algorithm not only significantly reduces computational costs, but also systematically improves detection accuracy when the number of additional points increases.
Score: 51.56874851156008
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The nature of modern data is increasingly real-time, making outlier detection crucial in any data-related field, such as finance for fraud detection and healthcare for monitoring patient vitals. Traditional outlier detection methods, such as the Local Outlier Factor (LOF) algorithm, struggle with real-time data due to the need for extensive recalculations with each new data point, limiting their application in real-time environments. While the Incremental LOF (ILOF) algorithm has been developed to tackle the challenges of online anomaly detection, it remains computationally expensive when processing large streams of data points, and its detection performance may degrade after a certain threshold of points have streamed in. In this paper, we propose a novel approach to enhance the efficiency of LOF algorithms for online anomaly detection, named the Efficient Incremental LOF (EILOF) algorithm. The EILOF algorithm only computes the LOF scores of new points without altering the LOF scores of existing data points. Although exact LOF scores have not yet been computed for the existing points in the new algorithm, datasets often contain noise, and minor deviations in LOF score calculations do not necessarily degrade detection performance. In fact, such deviations can sometimes enhance outlier detection. We systematically tested this approach on both simulated and real-world datasets, demonstrating that EILOF outperforms ILOF as the volume of streaming data increases across various scenarios. The EILOF algorithm not only significantly reduces computational costs, but also systematically improves detection accuracy when the number of additional points increases compared to the ILOF algorithm.

Related papers

Tighten The Lasso: A Convex Hull Volume-based Anomaly Detection Method [0.6144680854063939]
We propose a novel anomaly detection algorithm based on the convex hull property of a dataset. Our algorithm computes the CH's volume as an increasing number of data points are removed from the dataset. We show that with a computationally cheap and simple check, one can detect datasets that are well-suited for the proposed algorithm.
arXiv Detail & Related papers (2025-02-25T19:39:20Z)
Score-matching-based Structure Learning for Temporal Data on Networks [17.166362605356074]
Causal discovery is a crucial initial step in establishing causality from empirical data and background knowledge.<n>Current score-matching-based algorithms are primarily designed to analyze independent and identically distributed (i.i.d.) data.<n>We have developed a new parent-finding subroutine for leaf nodes in DAGs, significantly accelerating the most time-consuming part of the process: the pruning step.
arXiv Detail & Related papers (2024-12-10T12:36:35Z)
LAVA: Data Valuation without Pre-Specified Learning Algorithms [20.578106028270607]
We introduce a new framework that can value training data in a way that is oblivious to the downstream learning algorithm. We develop a proxy for the validation performance associated with a training set based on a non-conventional class-wise Wasserstein distance between training and validation sets. We show that the distance characterizes the upper bound of the validation performance for any given model under certain Lipschitz conditions.
arXiv Detail & Related papers (2023-04-28T19:05:16Z)
A Log-Linear Non-Parametric Online Changepoint Detection Algorithm based on Functional Pruning [5.202524136984542]
We build a flexible nonparametric approach to detect a change in the distribution of a sequence. Thanks to functional pruning ideas, NP-FOCuS has a computational cost that is log-linear in the number of observations. In terms of detection power, NP-FOCuS is seen to outperform current nonparametric online changepoint techniques in a variety of settings.
arXiv Detail & Related papers (2023-02-06T11:50:02Z)
PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows. Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z)
Large-scale Optimization of Partial AUC in a Range of False Positive Rates [51.12047280149546]
The area under the ROC curve (AUC) is one of the most widely used performance measures for classification models in machine learning. We develop an efficient approximated gradient descent method based on recent practical envelope smoothing technique. Our proposed algorithm can also be used to minimize the sum of some ranked range loss, which also lacks efficient solvers.
arXiv Detail & Related papers (2022-03-03T03:46:18Z)
Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z)
IPOF: An Extremely and Excitingly Simple Outlier Detection Booster via Infinite Propagation [30.91911545889579]
Outlier detection is one of the most popular and continuously rising topics in the data mining field. In this paper, we consider the score-based outlier detection category and point out that the performance of current outlier detection algorithms might be further boosted by score propagation. Specifically, we propose Infinite propagation of Outlier Factor (iPOF) algorithm, an extremely and excitingly simple outlier detection booster via infinite propagation.
arXiv Detail & Related papers (2021-08-01T03:48:09Z)
Bayesian Optimization with Machine Learning Algorithms Towards Anomaly Detection [66.05992706105224]
In this paper, an effective anomaly detection framework is proposed utilizing Bayesian Optimization technique. The performance of the considered algorithms is evaluated using the ISCX 2012 dataset. Experimental results show the effectiveness of the proposed framework in term of accuracy rate, precision, low-false alarm rate, and recall.
arXiv Detail & Related papers (2020-08-05T19:29:35Z)
FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data. To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.