Tighten The Lasso: A Convex Hull Volume-based Anomaly Detection Method
- URL: http://arxiv.org/abs/2502.18601v1
- Date: Tue, 25 Feb 2025 19:39:20 GMT
- Title: Tighten The Lasso: A Convex Hull Volume-based Anomaly Detection Method
- Authors: Uri Itai, Asael Bar Ilan, Teddy Lazebnik,
- Abstract summary: We propose a novel anomaly detection algorithm based on the convex hull property of a dataset.<n>Our algorithm computes the CH's volume as an increasing number of data points are removed from the dataset.<n>We show that with a computationally cheap and simple check, one can detect datasets that are well-suited for the proposed algorithm.
- Score: 0.6144680854063939
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The rapid advancements in data-driven methodologies have underscored the critical importance of ensuring data quality. Consequently, detecting out-of-distribution (OOD) data has emerged as an essential task to maintain the reliability and robustness of data-driven models, in general, and machine and deep learning models, in particular. In this study, we leveraged the convex hull property of a dataset and the fact that anomalies highly contribute to the increase of the CH's volume to propose a novel anomaly detection algorithm. Our algorithm computes the CH's volume as an increasing number of data points are removed from the dataset to define a decision line between OOD and in-distribution data points. We compared the proposed algorithm to seven widely used anomaly detection algorithms over ten datasets, showing comparable results for state-of-the-art (SOTA) algorithms. Moreover, we show that with a computationally cheap and simple check, one can detect datasets that are well-suited for the proposed algorithm which outperforms the SOTA anomaly detection algorithms.
Related papers
- An Efficient Outlier Detection Algorithm for Data Streaming [51.56874851156008]
Traditional outlier detection methods, such as the Local Outlier Factor (LOF) algorithm, struggle with real-time data.<n>We propose a novel approach to enhance the efficiency of LOF algorithms for online anomaly detection, named the Efficient Incremental LOF (EILOF) algorithm.<n>The EILOF algorithm not only significantly reduces computational costs, but also systematically improves detection accuracy when the number of additional points increases.
arXiv Detail & Related papers (2025-01-02T05:12:43Z) - Unsupervised Anomaly Detection for Tabular Data Using Noise Evaluation [26.312206159418903]
Unsupervised anomaly detection (UAD) plays an important role in modern data analytics.<n>We present a novel UAD method by evaluating how much noise is in the data.<n>We provide theoretical guarantees, proving that the proposed method can detect anomalous data successfully.
arXiv Detail & Related papers (2024-12-16T05:35:58Z) - DeepHYDRA: Resource-Efficient Time-Series Anomaly Detection in Dynamically-Configured Systems [3.44012349879073]
We present DeepHYDRA (Deep Hybrid DBSCAN/Reduction-Based Anomaly Detection)
It combines DBSCAN and learning-based anomaly detection.
It is shown to reliably detect different types of anomalies in both large and complex datasets.
arXiv Detail & Related papers (2024-05-13T13:47:15Z) - Bagged Regularized $k$-Distances for Anomaly Detection [9.899763598214122]
We propose a new distance-based algorithm called bagged regularized $k$-distances for anomaly detection (BRDAD)
Our BRDAD algorithm selects the weights by minimizing the surrogate risk, i.e., the finite sample bound of the empirical risk of the bagged weighted $k$-distances for density estimation (BWDDE)
On the theoretical side, we establish fast convergence rates of the AUC regret of our algorithm and demonstrate that the bagging technique significantly reduces the computational complexity.
arXiv Detail & Related papers (2023-12-02T07:00:46Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Framing Algorithmic Recourse for Anomaly Detection [18.347886926848563]
We present an approach -- Context preserving Algorithmic Recourse for Anomalies in Tabular data (CARAT)
CARAT uses a transformer based encoder-decoder model to explain an anomaly by finding features with low likelihood.
Semantically coherent counterfactuals are generated by modifying the highlighted features, using the overall context of features in the anomalous instance(s)
arXiv Detail & Related papers (2022-06-29T03:30:51Z) - Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold.
We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples.
We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Am I Rare? An Intelligent Summarization Approach for Identifying Hidden
Anomalies [0.0]
In this paper, we propose an INtelligent Summarization approach for IDENTifying hidden anomalies, called INSIDENT.
Our approach is a clustering-based algorithm that dynamically maps original feature space to a new feature space by locally weighting features in each cluster. Besides, selecting representatives based on cluster size keeps the same distribution as the original data in summarized data.
arXiv Detail & Related papers (2020-12-24T23:22:57Z) - Stochastic Hard Thresholding Algorithms for AUC Maximization [49.00683387735522]
We develop a hard thresholding algorithm for AUC in distributiond classification.
We conduct experiments to show the efficiency and effectiveness of the proposed algorithms.
arXiv Detail & Related papers (2020-11-04T16:49:29Z) - Bayesian Optimization with Machine Learning Algorithms Towards Anomaly
Detection [66.05992706105224]
In this paper, an effective anomaly detection framework is proposed utilizing Bayesian Optimization technique.
The performance of the considered algorithms is evaluated using the ISCX 2012 dataset.
Experimental results show the effectiveness of the proposed framework in term of accuracy rate, precision, low-false alarm rate, and recall.
arXiv Detail & Related papers (2020-08-05T19:29:35Z) - Contextual-Bandit Anomaly Detection for IoT Data in Distributed
Hierarchical Edge Computing [65.78881372074983]
IoT devices can hardly afford complex deep neural networks (DNN) models, and offloading anomaly detection tasks to the cloud incurs long delay.
We propose and build a demo for an adaptive anomaly detection approach for distributed hierarchical edge computing (HEC) systems.
We show that our proposed approach significantly reduces detection delay without sacrificing accuracy, as compared to offloading detection tasks to the cloud.
arXiv Detail & Related papers (2020-04-15T06:13:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.