Homophily Outlier Detection in Non-IID Categorical Data
- URL: http://arxiv.org/abs/2103.11516v1
- Date: Sun, 21 Mar 2021 23:29:33 GMT
- Title: Homophily Outlier Detection in Non-IID Categorical Data
- Authors: Guansong Pang, Longbing Cao, Ling Chen
- Abstract summary: This work introduces a novel outlier detection framework and its two instances to identify outliers in categorical data.
It first defines and incorporates distribution-sensitive outlier factors and their interdependence into a value-value graph-based representation.
The learned value outlierness allows for either direct outlier detection or outlying feature selection.
- Score: 43.51919113927003
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most of existing outlier detection methods assume that the outlier factors
(i.e., outlierness scoring measures) of data entities (e.g., feature values and
data objects) are Independent and Identically Distributed (IID). This
assumption does not hold in real-world applications where the outlierness of
different entities is dependent on each other and/or taken from different
probability distributions (non-IID). This may lead to the failure of detecting
important outliers that are too subtle to be identified without considering the
non-IID nature. The issue is even intensified in more challenging contexts,
e.g., high-dimensional data with many noisy features. This work introduces a
novel outlier detection framework and its two instances to identify outliers in
categorical data by capturing non-IID outlier factors. Our approach first
defines and incorporates distribution-sensitive outlier factors and their
interdependence into a value-value graph-based representation. It then models
an outlierness propagation process in the value graph to learn the outlierness
of feature values. The learned value outlierness allows for either direct
outlier detection or outlying feature selection. The graph representation and
mining approach is employed here to well capture the rich non-IID
characteristics. Our empirical results on 15 real-world data sets with
different levels of data complexities show that (i) the proposed outlier
detection methods significantly outperform five state-of-the-art methods at the
95%/99% confidence level, achieving 10%-28% AUC improvement on the 10 most
complex data sets; and (ii) the proposed feature selection methods
significantly outperform three competing methods in enabling subsequent outlier
detection of two different existing detectors.
Related papers
- Regularized Contrastive Partial Multi-view Outlier Detection [76.77036536484114]
We propose a novel method named Regularized Contrastive Partial Multi-view Outlier Detection (RCPMOD)
In this framework, we utilize contrastive learning to learn view-consistent information and distinguish outliers by the degree of consistency.
Experimental results on four benchmark datasets demonstrate that our proposed approach could outperform state-of-the-art competitors.
arXiv Detail & Related papers (2024-08-02T14:34:27Z) - Rethinking Unsupervised Outlier Detection via Multiple Thresholding [15.686139522490189]
We propose a multiple thresholding (Multi-T) module to advance existing scoring methods.
It generates two thresholds that isolate inliers and outliers from the unlabelled target dataset.
Experiments verify that Multi-T can significantly improve proposed outlier scoring methods.
arXiv Detail & Related papers (2024-07-07T14:09:50Z) - Robust Outlier Rejection for 3D Registration with Variational Bayes [70.98659381852787]
We develop a novel variational non-local network-based outlier rejection framework for robust alignment.
We propose a voting-based inlier searching strategy to cluster the high-quality hypothetical inliers for transformation estimation.
arXiv Detail & Related papers (2023-04-04T03:48:56Z) - Adaptive Negative Evidential Deep Learning for Open-set Semi-supervised Learning [69.81438976273866]
Open-set semi-supervised learning (Open-set SSL) considers a more practical scenario, where unlabeled data and test data contain new categories (outliers) not observed in labeled data (inliers)
We introduce evidential deep learning (EDL) as an outlier detector to quantify different types of uncertainty, and design different uncertainty metrics for self-training and inference.
We propose a novel adaptive negative optimization strategy, making EDL more tailored to the unlabeled dataset containing both inliers and outliers.
arXiv Detail & Related papers (2023-03-21T09:07:15Z) - Are we really making much progress in unsupervised graph outlier
detection? Revisiting the problem with new insight and superior method [36.72922385614812]
UNOD focuses on detecting two kinds of typical outliers in graphs: the structural outlier and the contextual outlier.
We find that the most widely-used outlier injection approach has a serious data leakage issue.
We propose a new framework, Variance-based Graph Outlier Detection (VGOD), which combines our variance-based model and attribute reconstruction model to detect outliers in a balanced way.
arXiv Detail & Related papers (2022-10-24T04:09:35Z) - Unsupervised Outlier Detection using Memory and Contrastive Learning [53.77693158251706]
We think outlier detection can be done in the feature space by measuring the feature distance between outliers and inliers.
We propose a framework, MCOD, using a memory module and a contrastive learning module.
Our proposed MCOD achieves a considerable performance and outperforms nine state-of-the-art methods.
arXiv Detail & Related papers (2021-07-27T07:35:42Z) - Self-Trained One-class Classification for Unsupervised Anomaly Detection [56.35424872736276]
Anomaly detection (AD) has various applications across domains, from manufacturing to healthcare.
In this work, we focus on unsupervised AD problems whose entire training data are unlabeled and may contain both normal and anomalous samples.
To tackle this problem, we build a robust one-class classification framework via data refinement.
We show that our method outperforms state-of-the-art one-class classification method by 6.3 AUC and 12.5 average precision.
arXiv Detail & Related papers (2021-06-11T01:36:08Z) - Do We Really Need to Learn Representations from In-domain Data for
Outlier Detection? [6.445605125467574]
Methods based on the two-stage framework achieve state-of-the-art performance on this task.
We explore the possibility of avoiding the high cost of training a distinct representation for each outlier detection task.
In experiments, we demonstrate competitive or better performance on a variety of outlier detection benchmarks compared with previous two-stage methods.
arXiv Detail & Related papers (2021-05-19T17:30:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.