Related papers: Null Hypothesis Test for Anomaly Detection

Null Hypothesis Test for Anomaly Detection

URL: http://arxiv.org/abs/2210.02226v1
Date: Wed, 5 Oct 2022 13:03:55 GMT
Title: Null Hypothesis Test for Anomaly Detection
Authors: Jernej F. Kamenik, Manuel Szewc
Abstract summary: We extend the use of Classification Without Labels for anomaly detection with a hypothesis test designed to exclude the background-only hypothesis. By testing for statistical independence of the two discriminating dataset regions, we are able exclude the background-only hypothesis without relying on fixed anomaly score cuts or extrapolations of background estimates between regions.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We extend the use of Classification Without Labels for anomaly detection with a hypothesis test designed to exclude the background-only hypothesis. By testing for statistical independence of the two discriminating dataset regions, we are able exclude the background-only hypothesis without relying on fixed anomaly score cuts or extrapolations of background estimates between regions. The method relies on the assumption of conditional independence of anomaly score features and dataset regions, which can be ensured using existing decorrelation techniques. As a benchmark example, we consider the LHC Olympics dataset where we show that mutual information represents a suitable test for statistical independence and our method exhibits excellent and robust performance at different signal fractions even in presence of realistic feature correlations.

Related papers

A Sample Efficient Conditional Independence Test in the Presence of Discretization [54.047334792855345]
Conditional Independence (CI) tests directly to discretized data can lead to incorrect conclusions.<n>Recent advancements have sought to infer the correct CI relationship between the latent variables through binarizing observed data.<n>Motivated by this, this paper introduces a sample-efficient CI test that does not rely on the binarization process.
arXiv Detail & Related papers (2025-06-10T12:41:26Z)
Pre-validation Revisited [79.92204034170092]
We show properties and benefits of pre-validation in prediction, inference and error estimation by simulations and applications.<n>We propose not only an analytical distribution of the test statistic for the pre-validated predictor under certain models, but also a generic bootstrap procedure to conduct inference.
arXiv Detail & Related papers (2025-05-21T00:20:14Z)
Internal Incoherency Scores for Constraint-based Causal Discovery Algorithms [12.524536193679124]
We propose internal coherency scores that allow testing for assumption violations and finite sample errors. We illustrate our coherency scores on the PC algorithm with simulated and real-world datasets.
arXiv Detail & Related papers (2025-02-20T16:44:54Z)
Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point. Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z)
Sequential Predictive Two-Sample and Independence Testing [114.4130718687858]
We study the problems of sequential nonparametric two-sample and independence testing. We build upon the principle of (nonparametric) testing by betting.
arXiv Detail & Related papers (2023-04-29T01:30:33Z)
Empirical Bayesian Approaches for Robust Constraint-based Causal Discovery under Insufficient Data [38.883810061897094]
Causal discovery methods assume data sufficiency, which may not be the case in many real world datasets. We propose Bayesian-augmented frequentist independence tests to improve the performance of constraint-based causal discovery methods under insufficient data. Experiments show significant performance improvement in terms of both accuracy and efficiency over SOTA methods.
arXiv Detail & Related papers (2022-06-16T21:08:49Z)
Nonparametric Conditional Local Independence Testing [69.31200003384122]
Conditional local independence is an independence relation among continuous time processes. No nonparametric test of conditional local independence has been available. We propose such a nonparametric test based on double machine learning.
arXiv Detail & Related papers (2022-03-25T10:31:02Z)
Model-agnostic out-of-distribution detection using combined statistical tests [15.27980070479021]
We present simple methods for out-of-distribution detection using a trained generative model. We combine a classical parametric test (Rao's score test) with the recently introduced typicality test. Despite their simplicity and generality, these methods can be competitive with model-specific out-of-distribution detection algorithms.
arXiv Detail & Related papers (2022-03-02T13:32:09Z)
Data-SUITE: Data-centric identification of in-distribution incongruous examples [81.21462458089142]
Data-SUITE is a data-centric framework to identify incongruous regions of in-distribution (ID) data. We empirically validate Data-SUITE's performance and coverage guarantees.
arXiv Detail & Related papers (2022-02-17T18:58:31Z)
A Data-Driven Approach to Robust Hypothesis Testing Using Sinkhorn Uncertainty Sets [12.061662346636645]
We seek the worst-case detector over distributional uncertainty sets centered around the empirical distribution from samples using Sinkhorn distance. Compared with the Wasserstein robust test, the corresponding least favorable distributions are supported beyond the training samples, which provides a more flexible detector.
arXiv Detail & Related papers (2022-02-09T03:26:15Z)
Density of States Estimation for Out-of-Distribution Detection [69.90130863160384]
DoSE is the density of states estimator. We demonstrate DoSE's state-of-the-art performance against other unsupervised OOD detectors.
arXiv Detail & Related papers (2020-06-16T16:06:25Z)
On Disentangled Representations Learned From Correlated Data [59.41587388303554]
We bridge the gap to real-world scenarios by analyzing the behavior of the most prominent disentanglement approaches on correlated data. We show that systematically induced correlations in the dataset are being learned and reflected in the latent representations. We also demonstrate how to resolve these latent correlations, either using weak supervision during training or by post-hoc correcting a pre-trained model with a small number of labels.
arXiv Detail & Related papers (2020-06-14T12:47:34Z)
Achieving Equalized Odds by Resampling Sensitive Attributes [13.114114427206678]
We present a flexible framework for learning predictive models that approximately satisfy the equalized odds notion of fairness. This differentiable functional is used as a penalty driving the model parameters towards equalized odds. We develop a formal hypothesis test to detect whether a prediction rule violates this property, the first such test in the literature.
arXiv Detail & Related papers (2020-06-08T00:18:34Z)
Universal Data Anomaly Detection via Inverse Generative Adversary Network [4.162663632560141]
No training data are available for the distribution of anomaly data. A semi-supervised deep learning technique based on an inverse generative adversary network is proposed.
arXiv Detail & Related papers (2020-01-23T21:11:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.