Robust Statistical Scaling of Outlier Scores: Improving the Quality of Outlier Probabilities for Outliers (Extended Version)
- URL: http://arxiv.org/abs/2408.15874v3
- Date: Wed, 30 Oct 2024 15:51:52 GMT
- Title: Robust Statistical Scaling of Outlier Scores: Improving the Quality of Outlier Probabilities for Outliers (Extended Version)
- Authors: Philipp Röchner, Henrique O. Marques, Ricardo J. G. B. Campello, Arthur Zimek, Franz Rothlauf,
- Abstract summary: Outlier detection algorithms typically assign an outlier score to each observation in a dataset, indicating the degree to which an observation is an outlier.
This paper argues that statistical scaling, as commonly used in the literature, does not produce equally good probabilities for outliers as for inliers.
We propose robust statistical scaling, which uses robust estimators to improve the probabilities for outliers.
- Score: 2.871927594197754
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Outlier detection algorithms typically assign an outlier score to each observation in a dataset, indicating the degree to which an observation is an outlier. However, these scores are often not comparable across algorithms and can be difficult for humans to interpret. Statistical scaling addresses this problem by transforming outlier scores into outlier probabilities without using ground-truth labels, thereby improving interpretability and comparability across algorithms. However, the quality of this transformation can be different for outliers and inliers. Missing outliers in scenarios where they are of particular interest - such as healthcare, finance, or engineering - can be costly or dangerous. Thus, ensuring good probabilities for outliers is essential. This paper argues that statistical scaling, as commonly used in the literature, does not produce equally good probabilities for outliers as for inliers. Therefore, we propose robust statistical scaling, which uses robust estimators to improve the probabilities for outliers. We evaluate several variants of our method against other outlier score transformations for real-world datasets and outlier detection algorithms, where it can improve the probabilities for outliers.
Related papers
- SSB: Simple but Strong Baseline for Boosting Performance of Open-Set
Semi-Supervised Learning [106.46648817126984]
In this paper, we study the challenging and realistic open-set SSL setting.
The goal is to both correctly classify inliers and to detect outliers.
We find that inlier classification performance can be largely improved by incorporating high-confidence pseudo-labeled data.
arXiv Detail & Related papers (2023-11-17T15:14:40Z) - A Probabilistic Transformation of Distance-Based Outliers [2.1055643409860743]
We describe a generic transformation of distance-based outlier scores into interpretable, probabilistic estimates.
The transformation is ranking-stable and increases the contrast between normal and outlier data points.
Our work generalizes to a wide range of distance-based outlier detection methods.
arXiv Detail & Related papers (2023-05-16T14:05:30Z) - Robust Outlier Rejection for 3D Registration with Variational Bayes [70.98659381852787]
We develop a novel variational non-local network-based outlier rejection framework for robust alignment.
We propose a voting-based inlier searching strategy to cluster the high-quality hypothetical inliers for transformation estimation.
arXiv Detail & Related papers (2023-04-04T03:48:56Z) - Normalizing Flow based Feature Synthesis for Outlier-Aware Object
Detection [8.249143014271887]
General-purpose object detectors like Faster R-CNN are prone to providing overconfident predictions for outlier objects.
We propose a novel outlier-aware object detection framework that distinguishes outliers from inlier objects.
Our approach significantly outperforms the state-of-the-art for outlier-aware object detection on both image and video datasets.
arXiv Detail & Related papers (2023-02-01T13:12:00Z) - Robust computation of optimal transport by $\beta$-potential
regularization [79.24513412588745]
Optimal transport (OT) has become a widely used tool in the machine learning field to measure the discrepancy between probability distributions.
We propose regularizing OT with the beta-potential term associated with the so-called $beta$-divergence.
We experimentally demonstrate that the transport matrix computed with our algorithm helps estimate a probability distribution robustly even in the presence of outliers.
arXiv Detail & Related papers (2022-12-26T18:37:28Z) - C-AllOut: Catching & Calling Outliers by Type [10.69970450827617]
C-AllOut is a novel outlier detector that annotates outliers by type.
It is parameter-free and scalable, besides working only with pairwise similarities (or distances) when it is needed.
arXiv Detail & Related papers (2021-10-13T14:25:52Z) - Comparison of Outlier Detection Techniques for Structured Data [2.2907341026741017]
An outlier is an observation or a data point that is far from rest of the data points in a given dataset.
It is seen that the removal of outliers from the training dataset before modeling can give better predictions.
The goal of this work is to highlight and compare some of the existing outlier detection techniques for the data scientists to use that information for outlier algorithm selection.
arXiv Detail & Related papers (2021-06-16T13:40:02Z) - OpenMatch: Open-set Consistency Regularization for Semi-supervised
Learning with Outliers [71.08167292329028]
We propose a novel Open-set Semi-Supervised Learning (OSSL) approach called OpenMatch.
OpenMatch unifies FixMatch with novelty detection based on one-vs-all (OVA) classifiers.
It achieves state-of-the-art performance on three datasets, and even outperforms a fully supervised model in detecting outliers unseen in unlabeled data on CIFAR10.
arXiv Detail & Related papers (2021-05-28T23:57:15Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - Homophily Outlier Detection in Non-IID Categorical Data [43.51919113927003]
This work introduces a novel outlier detection framework and its two instances to identify outliers in categorical data.
It first defines and incorporates distribution-sensitive outlier factors and their interdependence into a value-value graph-based representation.
The learned value outlierness allows for either direct outlier detection or outlying feature selection.
arXiv Detail & Related papers (2021-03-21T23:29:33Z) - $\gamma$-ABC: Outlier-Robust Approximate Bayesian Computation Based on a
Robust Divergence Estimator [95.71091446753414]
We propose to use a nearest-neighbor-based $gamma$-divergence estimator as a data discrepancy measure.
Our method achieves significantly higher robustness than existing discrepancy measures.
arXiv Detail & Related papers (2020-06-13T06:09:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.