Related papers: A Probabilistic Transformation of Distance-Based Outliers

A Probabilistic Transformation of Distance-Based Outliers

URL: http://arxiv.org/abs/2305.09446v2
Date: Tue, 18 Jul 2023 20:01:42 GMT
Title: A Probabilistic Transformation of Distance-Based Outliers
Authors: David Muhr, Michael Affenzeller, Josef K\"ung
Abstract summary: We describe a generic transformation of distance-based outlier scores into interpretable, probabilistic estimates. The transformation is ranking-stable and increases the contrast between normal and outlier data points. Our work generalizes to a wide range of distance-based outlier detection methods.
Score: 2.1055643409860743
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The scores of distance-based outlier detection methods are difficult to interpret, making it challenging to determine a cut-off threshold between normal and outlier data points without additional context. We describe a generic transformation of distance-based outlier scores into interpretable, probabilistic estimates. The transformation is ranking-stable and increases the contrast between normal and outlier data points. Determining distance relationships between data points is necessary to identify the nearest-neighbor relationships in the data, yet, most of the computed distances are typically discarded. We show that the distances to other data points can be used to model distance probability distributions and, subsequently, use the distributions to turn distance-based outlier scores into outlier probabilities. Our experiments show that the probabilistic transformation does not impact detection performance over numerous tabular and image benchmark datasets but results in interpretable outlier scores with increased contrast between normal and outlier samples. Our work generalizes to a wide range of distance-based outlier detection methods, and because existing distance computations are used, it adds no significant computational overhead.

Related papers

Explainable Unsupervised Anomaly Detection with Random Forest [1.0485739694839669]
We describe the use of an unsupervised Random Forest for similarity learning and improved anomaly detection. By training a Random Forest to discriminate between real data and synthetic data sampled from a uniform distribution over the real data bounds, a distance measure is obtained that anisometrically transforms the data. We show that using distances recovered from this transformation improves the accuracy of unsupervised anomaly detection, compared to other commonly used detectors.
arXiv Detail & Related papers (2025-04-22T17:54:44Z)
Directional anomaly detection [4.174296652683762]
Semi-supervised anomaly detection is based on the principle that potential anomalies are those records that look different from normal training data. We present two asymmetrical distance measures that take this directionality into account: ramp distance and signed distance.
arXiv Detail & Related papers (2024-10-30T16:11:40Z)
Robust Statistical Scaling of Outlier Scores: Improving the Quality of Outlier Probabilities for Outliers (Extended Version) [2.871927594197754]
Outlier detection algorithms typically assign an outlier score to each observation in a dataset, indicating the degree to which an observation is an outlier. This paper argues that statistical scaling, as commonly used in the literature, does not produce equally good probabilities for outliers as for inliers. We propose robust statistical scaling, which uses robust estimators to improve the probabilities for outliers.
arXiv Detail & Related papers (2024-08-28T15:44:34Z)
Mean Estimation with User-level Privacy under Data Heterogeneity [54.07947274508013]
Different users may possess vastly different numbers of data points. It cannot be assumed that all users sample from the same underlying distribution. We propose a simple model of heterogeneous user data that allows user data to differ in both distribution and quantity of data.
arXiv Detail & Related papers (2023-07-28T23:02:39Z)
Positive Difference Distribution for Image Outlier Detection using Normalizing Flows and Contrastive Data [2.9005223064604078]
Likelihoods learned by a generative model, e.g., a normalizing flow via standard log-likelihood training, perform poorly as an outlier score. We propose to use an unlabelled auxiliary dataset and a probabilistic outlier score for outlier detection. We show that this is equivalent to learning the normalized positive difference between the in-distribution and the contrastive feature density.
arXiv Detail & Related papers (2022-08-30T07:00:46Z)
Robust Multi-Object Tracking by Marginal Inference [92.48078680697311]
Multi-object tracking in videos requires to solve a fundamental problem of one-to-one assignment between objects in adjacent frames. We present an efficient approach to compute a marginal probability for each pair of objects in real time. It achieves competitive results on MOT17 and MOT20 benchmarks.
arXiv Detail & Related papers (2022-08-07T14:04:45Z)
Kernel distance measures for time series, random fields and other structured data [71.61147615789537]
kdiff is a novel kernel-based measure for estimating distances between instances of structured data. It accounts for both self and cross similarities across the instances and is defined using a lower quantile of the distance distribution. Some theoretical results are provided for separability conditions using kdiff as a distance measure for clustering and classification problems.
arXiv Detail & Related papers (2021-09-29T22:54:17Z)
The Exploitation of Distance Distributions for Clustering [3.42658286826597]
In cluster analysis, different properties for distance distributions are judged to be relevant for appropriate distance selection. By systematically investigating this specification using distribution analysis through a mirrored-density plot, it is shown that multimodal distance distributions are preferable in cluster analysis. Experiments are performed on several artificial datasets and natural datasets for the task of clustering.
arXiv Detail & Related papers (2021-08-22T06:22:08Z)
On the relation between statistical learning and perceptual distances [61.25815733012866]
We show that perceptual sensitivity is correlated with the probability of an image in its close neighborhood. We also explore the relation between distances induced by autoencoders and the probability distribution of the data used for training them.
arXiv Detail & Related papers (2021-06-08T14:56:56Z)
Pretrained equivariant features improve unsupervised landmark discovery [69.02115180674885]
We formulate a two-step unsupervised approach that overcomes this challenge by first learning powerful pixel-based features. Our method produces state-of-the-art results in several challenging landmark detection datasets.
arXiv Detail & Related papers (2021-04-07T05:42:11Z)
$\gamma$-ABC: Outlier-Robust Approximate Bayesian Computation Based on a Robust Divergence Estimator [95.71091446753414]
We propose to use a nearest-neighbor-based $gamma$-divergence estimator as a data discrepancy measure. Our method achieves significantly higher robustness than existing discrepancy measures.
arXiv Detail & Related papers (2020-06-13T06:09:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.