C-AllOut: Catching & Calling Outliers by Type
- URL: http://arxiv.org/abs/2110.08257v1
- Date: Wed, 13 Oct 2021 14:25:52 GMT
- Title: C-AllOut: Catching & Calling Outliers by Type
- Authors: Guilherme D. F. Silva, Leman Akoglu, Robson L. F. Cordeiro
- Abstract summary: C-AllOut is a novel outlier detector that annotates outliers by type.
It is parameter-free and scalable, besides working only with pairwise similarities (or distances) when it is needed.
- Score: 10.69970450827617
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Given an unlabeled dataset, wherein we have access only to pairwise
similarities (or distances), how can we effectively (1) detect outliers, and
(2) annotate/tag the outliers by type? Outlier detection has a large
literature, yet we find a key gap in the field: to our knowledge, no existing
work addresses the outlier annotation problem. Outliers are broadly classified
into 3 types, representing distinct patterns that could be valuable to
analysts: (a) global outliers are severe yet isolate cases that do not repeat,
e.g., a data collection error; (b) local outliers diverge from their peers
within a context, e.g., a particularly short basketball player; and (c)
collective outliers are isolated micro-clusters that may indicate coalition or
repetitions, e.g., frauds that exploit the same loophole. This paper presents
C-AllOut: a novel and effective outlier detector that annotates outliers by
type. It is parameter-free and scalable, besides working only with pairwise
similarities (or distances) when it is needed. We show that C-AllOut achieves
on par or significantly better performance than state-of-the-art detectors when
spotting outliers regardless of their type. It is also highly effective in
annotating outliers of particular types, a task that none of the baselines can
perform.
Related papers
- Regularized Contrastive Partial Multi-view Outlier Detection [76.77036536484114]
We propose a novel method named Regularized Contrastive Partial Multi-view Outlier Detection (RCPMOD)
In this framework, we utilize contrastive learning to learn view-consistent information and distinguish outliers by the degree of consistency.
Experimental results on four benchmark datasets demonstrate that our proposed approach could outperform state-of-the-art competitors.
arXiv Detail & Related papers (2024-08-02T14:34:27Z) - SSB: Simple but Strong Baseline for Boosting Performance of Open-Set
Semi-Supervised Learning [106.46648817126984]
In this paper, we study the challenging and realistic open-set SSL setting.
The goal is to both correctly classify inliers and to detect outliers.
We find that inlier classification performance can be largely improved by incorporating high-confidence pseudo-labeled data.
arXiv Detail & Related papers (2023-11-17T15:14:40Z) - IOMatch: Simplifying Open-Set Semi-Supervised Learning with Joint
Inliers and Outliers Utilization [36.102831230805755]
In many real-world applications, unlabeled data will inevitably contain unseen-class outliers not belonging to any of the labeled classes.
We introduce a novel open-set SSL framework, IOMatch, which can jointly utilize inliers and outliers, even when it is difficult to distinguish exactly between them.
arXiv Detail & Related papers (2023-08-25T04:14:02Z) - Robust Outlier Rejection for 3D Registration with Variational Bayes [70.98659381852787]
We develop a novel variational non-local network-based outlier rejection framework for robust alignment.
We propose a voting-based inlier searching strategy to cluster the high-quality hypothetical inliers for transformation estimation.
arXiv Detail & Related papers (2023-04-04T03:48:56Z) - ODIM: Outlier Detection via Likelihood of Under-Fitted Generative Models [4.956259629094216]
unsupervised outlier detection (UOD) problem refers to a task to identify inliers given training data which contain outliers as well as inliers.
We develop a new method called the outlier detection via the IM effect (ODIM)
Remarkably, the ODIM requires only a few updates, making it computationally efficient at least tens of times faster than other deep-learning-based algorithms.
arXiv Detail & Related papers (2023-01-11T01:02:27Z) - Non-contrastive representation learning for intervals from well logs [58.70164460091879]
The representation learning problem in the oil & gas industry aims to construct a model that provides a representation based on logging data for a well interval.
One of the possible approaches is self-supervised learning (SSL)
We are the first to introduce non-contrastive SSL for well-logging data.
arXiv Detail & Related papers (2022-09-28T13:27:10Z) - Unsupervised Outlier Detection using Memory and Contrastive Learning [53.77693158251706]
We think outlier detection can be done in the feature space by measuring the feature distance between outliers and inliers.
We propose a framework, MCOD, using a memory module and a contrastive learning module.
Our proposed MCOD achieves a considerable performance and outperforms nine state-of-the-art methods.
arXiv Detail & Related papers (2021-07-27T07:35:42Z) - OpenMatch: Open-set Consistency Regularization for Semi-supervised
Learning with Outliers [71.08167292329028]
We propose a novel Open-set Semi-Supervised Learning (OSSL) approach called OpenMatch.
OpenMatch unifies FixMatch with novelty detection based on one-vs-all (OVA) classifiers.
It achieves state-of-the-art performance on three datasets, and even outperforms a fully supervised model in detecting outliers unseen in unlabeled data on CIFAR10.
arXiv Detail & Related papers (2021-05-28T23:57:15Z) - Homophily Outlier Detection in Non-IID Categorical Data [43.51919113927003]
This work introduces a novel outlier detection framework and its two instances to identify outliers in categorical data.
It first defines and incorporates distribution-sensitive outlier factors and their interdependence into a value-value graph-based representation.
The learned value outlierness allows for either direct outlier detection or outlying feature selection.
arXiv Detail & Related papers (2021-03-21T23:29:33Z) - Benchmarking Unsupervised Outlier Detection with Realistic Synthetic
Data [0.0]
Benchmarking unsupervised outlier detection is difficult.
We propose a generic process for the generation of data sets for such benchmarking.
We describe three instantiations of the generic process that generate outliers with specific characteristics.
arXiv Detail & Related papers (2020-04-15T08:55:47Z) - Finding Outliers in Gaussian Model-Based Clustering [1.0435741631709405]
Clustering, or unsupervised classification, is a task often plagued by outliers.
There is a paucity of work on handling outliers in clustering.
arXiv Detail & Related papers (2019-07-02T03:02:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.