Related papers: Rare anomalies require large datasets: About proving the existence of anomalies

Rare anomalies require large datasets: About proving the existence of anomalies

URL: http://arxiv.org/abs/2508.09894v1
Date: Wed, 13 Aug 2025 15:52:33 GMT
Title: Rare anomalies require large datasets: About proving the existence of anomalies
Authors: Simon Klüttermann, Emmanuel Müller,
Abstract summary: This paper presents a comprehensive study that addresses the fundamental question: When can we conclusively determine that anomalies are present?<n>We identify a relationship between the dataset size, contamination rate, and an algorithm-dependent constant $ alpha_textalgo $.<n>Our results demonstrate that, for an unlabeled dataset of size $ N $ and contamination rate $ nu $, the condition $ N ge fracalpha_textalgonu2 $ represents a lower bound on the number of samples required to confirm anomaly existence.
Score: 5.555497750998242
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Detecting whether any anomalies exist within a dataset is crucial for effective anomaly detection, yet it remains surprisingly underexplored in anomaly detection literature. This paper presents a comprehensive study that addresses the fundamental question: When can we conclusively determine that anomalies are present? Through extensive experimentation involving over three million statistical tests across various anomaly detection tasks and algorithms, we identify a relationship between the dataset size, contamination rate, and an algorithm-dependent constant $ \alpha_{\text{algo}} $. Our results demonstrate that, for an unlabeled dataset of size $ N $ and contamination rate $ \nu $, the condition $ N \ge \frac{\alpha_{\text{algo}}}{\nu^2} $ represents a lower bound on the number of samples required to confirm anomaly existence. This threshold implies a limit to how rare anomalies can be before proving their existence becomes infeasible.

Related papers

KKA: Improving Vision Anomaly Detection through Anomaly-related Knowledge from Large Language Models [54.63075553088399]
Key Knowledge Augmentation (KKA) is a method that extracts anomaly-related knowledge from large language models (LLMs)<n>KKA classifies the generated anomalies as easy anomalies and hard anomalies according to their similarity to normal samples.<n> Experimental results show that the proposed method significantly improves the performance of various vision anomaly detectors.
arXiv Detail & Related papers (2025-02-14T07:46:49Z)
SAD: Semi-Supervised Anomaly Detection on Dynamic Graphs [11.819993729810257]
Anomaly detection aims to distinguish abnormal instances that deviate significantly from the majority of benign ones. graph neural networks become increasingly popular in tackling the anomaly detection problem. We present semi-supervised anomaly detection (SAD), an end-to-end framework for anomaly detection on dynamic graphs.
arXiv Detail & Related papers (2023-05-23T01:05:34Z)
MetaGAD: Meta Representation Adaptation for Few-Shot Graph Anomaly Detection [31.218962952724624]
We study an important problem of few-shot graph anomaly detection. We propose a novel meta-learning based framework, MetaGAD, that learns to adapt the knowledge from self-supervised learning to few-shot supervised learning. In specific, we formulate the problem as a bi-level optimization, ensuring MetaGAD converging to minimize the validation loss.
arXiv Detail & Related papers (2023-05-18T03:04:51Z)
AGAD: Adversarial Generative Anomaly Detection [12.68966318231776]
Anomaly detection suffered from the lack of anomalies due to the diversity of abnormalities and the difficulties of obtaining large-scale anomaly data. We propose Adversarial Generative Anomaly Detection (AGAD), a self-contrast-based anomaly detection paradigm. Our method generates pseudo-anomaly data for both supervised and semi-supervised anomaly detection scenarios.
arXiv Detail & Related papers (2023-04-09T10:40:02Z)
Are we certain it's anomalous? [57.729669157989235]
Anomaly detection in time series is a complex task since anomalies are rare due to highly non-linear temporal correlations. Here we propose the novel use of Hyperbolic uncertainty for Anomaly Detection (HypAD) HypAD learns self-supervisedly to reconstruct the input signal.
arXiv Detail & Related papers (2022-11-16T21:31:39Z)
Catching Both Gray and Black Swans: Open-set Supervised Anomaly Detection [90.32910087103744]
A few labeled anomaly examples are often available in many real-world applications. These anomaly examples provide valuable knowledge about the application-specific abnormality. Those anomalies seen during training often do not illustrate every possible class of anomaly. This paper tackles open-set supervised anomaly detection.
arXiv Detail & Related papers (2022-03-28T05:21:37Z)
Explainable Deep Few-shot Anomaly Detection with Deviation Networks [123.46611927225963]
We introduce a novel weakly-supervised anomaly detection framework to train detection models. The proposed approach learns discriminative normality by leveraging the labeled anomalies and a prior probability. Our model is substantially more sample-efficient and robust, and performs significantly better than state-of-the-art competing methods in both closed-set and open-set settings.
arXiv Detail & Related papers (2021-08-01T14:33:17Z)
Understanding the Effect of Bias in Deep Anomaly Detection [15.83398707988473]
Anomaly detection presents a unique challenge in machine learning, due to the scarcity of labeled anomaly data. Recent work attempts to mitigate such problems by augmenting training of deep anomaly detection models with additional labeled anomaly samples. In this paper, we aim to understand the effect of a biased anomaly set on anomaly detection.
arXiv Detail & Related papers (2021-05-16T03:55:02Z)
Toward Deep Supervised Anomaly Detection: Reinforcement Learning from Partially Labeled Anomaly Data [150.9270911031327]
We consider the problem of anomaly detection with a small set of partially labeled anomaly examples and a large-scale unlabeled dataset. Existing related methods either exclusively fit the limited anomaly examples that typically do not span the entire set of anomalies, or proceed with unsupervised learning from the unlabeled data. We propose here instead a deep reinforcement learning-based approach that enables an end-to-end optimization of the detection of both labeled and unlabeled anomalies.
arXiv Detail & Related papers (2020-09-15T03:05:39Z)
Deep Weakly-supervised Anomaly Detection [118.55172352231381]
Pairwise Relation prediction Network (PReNet) learns pairwise relation features and anomaly scores. PReNet can detect any seen/unseen abnormalities that fit the learned pairwise abnormal patterns. Empirical results on 12 real-world datasets show that PReNet significantly outperforms nine competing methods in detecting seen and unseen anomalies.
arXiv Detail & Related papers (2019-10-30T00:40:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.