A Probabilistic Transformation of Distance-Based Outliers
- URL: http://arxiv.org/abs/2305.09446v2
- Date: Tue, 18 Jul 2023 20:01:42 GMT
- Title: A Probabilistic Transformation of Distance-Based Outliers
- Authors: David Muhr, Michael Affenzeller, Josef K\"ung
- Abstract summary: We describe a generic transformation of distance-based outlier scores into interpretable, probabilistic estimates.
The transformation is ranking-stable and increases the contrast between normal and outlier data points.
Our work generalizes to a wide range of distance-based outlier detection methods.
- Score: 2.1055643409860743
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The scores of distance-based outlier detection methods are difficult to
interpret, making it challenging to determine a cut-off threshold between
normal and outlier data points without additional context. We describe a
generic transformation of distance-based outlier scores into interpretable,
probabilistic estimates. The transformation is ranking-stable and increases the
contrast between normal and outlier data points. Determining distance
relationships between data points is necessary to identify the nearest-neighbor
relationships in the data, yet, most of the computed distances are typically
discarded. We show that the distances to other data points can be used to model
distance probability distributions and, subsequently, use the distributions to
turn distance-based outlier scores into outlier probabilities. Our experiments
show that the probabilistic transformation does not impact detection
performance over numerous tabular and image benchmark datasets but results in
interpretable outlier scores with increased contrast between normal and outlier
samples. Our work generalizes to a wide range of distance-based outlier
detection methods, and because existing distance computations are used, it adds
no significant computational overhead.
Related papers
- Directional anomaly detection [4.174296652683762]
Semi-supervised anomaly detection is based on the principle that potential anomalies are those records that look different from normal training data.
We present two asymmetrical distance measures that take this directionality into account: ramp distance and signed distance.
arXiv Detail & Related papers (2024-10-30T16:11:40Z) - Robust Statistical Scaling of Outlier Scores: Improving the Quality of Outlier Probabilities for Outliers (Extended Version) [2.871927594197754]
Outlier detection algorithms typically assign an outlier score to each observation in a dataset, indicating the degree to which an observation is an outlier.
This paper argues that statistical scaling, as commonly used in the literature, does not produce equally good probabilities for outliers as for inliers.
We propose robust statistical scaling, which uses robust estimators to improve the probabilities for outliers.
arXiv Detail & Related papers (2024-08-28T15:44:34Z) - Mean Estimation with User-level Privacy under Data Heterogeneity [54.07947274508013]
Different users may possess vastly different numbers of data points.
It cannot be assumed that all users sample from the same underlying distribution.
We propose a simple model of heterogeneous user data that allows user data to differ in both distribution and quantity of data.
arXiv Detail & Related papers (2023-07-28T23:02:39Z) - Positive Difference Distribution for Image Outlier Detection using
Normalizing Flows and Contrastive Data [2.9005223064604078]
Likelihoods learned by a generative model, e.g., a normalizing flow via standard log-likelihood training, perform poorly as an outlier score.
We propose to use an unlabelled auxiliary dataset and a probabilistic outlier score for outlier detection.
We show that this is equivalent to learning the normalized positive difference between the in-distribution and the contrastive feature density.
arXiv Detail & Related papers (2022-08-30T07:00:46Z) - Robust Multi-Object Tracking by Marginal Inference [92.48078680697311]
Multi-object tracking in videos requires to solve a fundamental problem of one-to-one assignment between objects in adjacent frames.
We present an efficient approach to compute a marginal probability for each pair of objects in real time.
It achieves competitive results on MOT17 and MOT20 benchmarks.
arXiv Detail & Related papers (2022-08-07T14:04:45Z) - Kernel distance measures for time series, random fields and other
structured data [71.61147615789537]
kdiff is a novel kernel-based measure for estimating distances between instances of structured data.
It accounts for both self and cross similarities across the instances and is defined using a lower quantile of the distance distribution.
Some theoretical results are provided for separability conditions using kdiff as a distance measure for clustering and classification problems.
arXiv Detail & Related papers (2021-09-29T22:54:17Z) - The Exploitation of Distance Distributions for Clustering [3.42658286826597]
In cluster analysis, different properties for distance distributions are judged to be relevant for appropriate distance selection.
By systematically investigating this specification using distribution analysis through a mirrored-density plot, it is shown that multimodal distance distributions are preferable in cluster analysis.
Experiments are performed on several artificial datasets and natural datasets for the task of clustering.
arXiv Detail & Related papers (2021-08-22T06:22:08Z) - On the relation between statistical learning and perceptual distances [61.25815733012866]
We show that perceptual sensitivity is correlated with the probability of an image in its close neighborhood.
We also explore the relation between distances induced by autoencoders and the probability distribution of the data used for training them.
arXiv Detail & Related papers (2021-06-08T14:56:56Z) - Pretrained equivariant features improve unsupervised landmark discovery [69.02115180674885]
We formulate a two-step unsupervised approach that overcomes this challenge by first learning powerful pixel-based features.
Our method produces state-of-the-art results in several challenging landmark detection datasets.
arXiv Detail & Related papers (2021-04-07T05:42:11Z) - $\gamma$-ABC: Outlier-Robust Approximate Bayesian Computation Based on a
Robust Divergence Estimator [95.71091446753414]
We propose to use a nearest-neighbor-based $gamma$-divergence estimator as a data discrepancy measure.
Our method achieves significantly higher robustness than existing discrepancy measures.
arXiv Detail & Related papers (2020-06-13T06:09:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.