Detecting Localized Density Anomalies in Multivariate Data via Coin-Flip Statistics
- URL: http://arxiv.org/abs/2503.23927v2
- Date: Wed, 02 Apr 2025 10:07:05 GMT
- Title: Detecting Localized Density Anomalies in Multivariate Data via Coin-Flip Statistics
- Authors: Sebastian Springer, Andre Scaffidi, Maximilian Autenrieth, Gabriella Contardo, Alessandro Laio, Roberto Trotta, Heikki Haario,
- Abstract summary: We introduce EagleEye, an anomaly detection method to compare two datasets.<n>Anomalies are detected by modelling, for each point, the ordered sequence of its neighbours' membership label.<n>We demonstrate its effectiveness through experiments on both synthetic and real-world datasets.
- Score: 38.24458888666912
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Detecting localized density differences in multivariate data is a crucial task in computational science. Such anomalies can indicate a critical system failure, lead to a groundbreaking scientific discovery, or reveal unexpected changes in data distribution. We introduce EagleEye, an anomaly detection method to compare two multivariate datasets with the aim of identifying local density anomalies, namely over- or under-densities affecting only localised regions of the feature space. Anomalies are detected by modelling, for each point, the ordered sequence of its neighbours' membership label as a coin-flipping process and monitoring deviations from the expected behaviour of such process. A unique advantage of our method is its ability to provide an accurate, entirely unsupervised estimate of the local signal purity. We demonstrate its effectiveness through experiments on both synthetic and real-world datasets. In synthetic data, EagleEye accurately detects anomalies in multiple dimensions even when they affect a tiny fraction of the data. When applied to a challenging resonant anomaly detection benchmark task in simulated Large Hadron Collider data, EagleEye successfully identifies particle decay events present in just 0.3% of the dataset. In global temperature data, EagleEye uncovers previously unidentified, geographically localised changes in temperature fields that occurred in the most recent years. Thanks to its key advantages of conceptual simplicity, computational efficiency, trivial parallelisation, and scalability, EagleEye is widely applicable across many fields.
Related papers
- Explainable Unsupervised Anomaly Detection with Random Forest [1.0485739694839669]
We describe the use of an unsupervised Random Forest for similarity learning and improved anomaly detection.
By training a Random Forest to discriminate between real data and synthetic data sampled from a uniform distribution over the real data bounds, a distance measure is obtained that anisometrically transforms the data.
We show that using distances recovered from this transformation improves the accuracy of unsupervised anomaly detection, compared to other commonly used detectors.
arXiv Detail & Related papers (2025-04-22T17:54:44Z) - A Dataset for Semantic Segmentation in the Presence of Unknowns [49.795683850385956]
Existing datasets allow evaluation of only knowns or unknowns - but not both.<n>We propose a novel anomaly segmentation dataset, ISSU, that features a diverse set of anomaly inputs from cluttered real-world environments.<n>The dataset is twice larger than existing anomaly segmentation datasets.
arXiv Detail & Related papers (2025-03-28T10:31:01Z) - Adaptive Deviation Learning for Visual Anomaly Detection with Data Contamination [20.4008901760593]
We introduce a systematic adaptive method that employs deviation learning to compute anomaly scores end-to-end.
Our proposed method surpasses competing techniques and exhibits both stability and robustness in the presence of data contamination.
arXiv Detail & Related papers (2024-11-14T16:10:15Z) - Graph Spatiotemporal Process for Multivariate Time Series Anomaly
Detection with Missing Values [67.76168547245237]
We introduce a novel framework called GST-Pro, which utilizes a graphtemporal process and anomaly scorer to detect anomalies.
Our experimental results show that the GST-Pro method can effectively detect anomalies in time series data and outperforms state-of-the-art methods.
arXiv Detail & Related papers (2024-01-11T10:10:16Z) - Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets.
We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z) - AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model [59.08735812631131]
Anomaly inspection plays an important role in industrial manufacture.
Existing anomaly inspection methods are limited in their performance due to insufficient anomaly data.
We propose AnomalyDiffusion, a novel diffusion-based few-shot anomaly generation model.
arXiv Detail & Related papers (2023-12-10T05:13:40Z) - Exploring Global and Local Information for Anomaly Detection with Normal
Samples [23.68962459770419]
Anomaly detection aims to detect data that do not conform to regular patterns, and such data is also called outliers.
In many realistic scenarios, only the samples following normal behavior are observed, while we can hardly obtain any anomaly information.
We propose an anomaly detection method GALDetector which is combined of global and local information based on observed normal samples.
arXiv Detail & Related papers (2023-06-03T06:51:22Z) - HFN: Heterogeneous Feature Network for Multivariate Time Series Anomaly
Detection [2.253268952202213]
We propose a novel semi-supervised anomaly detection framework based on a heterogeneous feature network (HFN) for MTS.
We first combine the embedding similarity subgraph generated by sensor embedding and feature value similarity subgraph generated by sensor values to construct a time-series heterogeneous graph.
This approach fuses the state-of-the-art technologies of heterogeneous graph structure learning (HGSL) and representation learning.
arXiv Detail & Related papers (2022-11-01T05:01:34Z) - Latent Outlier Exposure for Anomaly Detection with Contaminated Data [31.446666264334528]
Anomaly detection aims at identifying data points that show systematic deviations from the majority of data in an unlabeled dataset.
We propose a strategy for training an anomaly detector in the presence of unlabeled anomalies that is compatible with a broad class of models.
arXiv Detail & Related papers (2022-02-16T14:21:28Z) - Toward Deep Supervised Anomaly Detection: Reinforcement Learning from
Partially Labeled Anomaly Data [150.9270911031327]
We consider the problem of anomaly detection with a small set of partially labeled anomaly examples and a large-scale unlabeled dataset.
Existing related methods either exclusively fit the limited anomaly examples that typically do not span the entire set of anomalies, or proceed with unsupervised learning from the unlabeled data.
We propose here instead a deep reinforcement learning-based approach that enables an end-to-end optimization of the detection of both labeled and unlabeled anomalies.
arXiv Detail & Related papers (2020-09-15T03:05:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.