Homophily Outlier Detection in Non-IID Categorical Data
- URL: http://arxiv.org/abs/2103.11516v1
- Date: Sun, 21 Mar 2021 23:29:33 GMT
- Title: Homophily Outlier Detection in Non-IID Categorical Data
- Authors: Guansong Pang, Longbing Cao, Ling Chen
- Abstract summary: This work introduces a novel outlier detection framework and its two instances to identify outliers in categorical data.
It first defines and incorporates distribution-sensitive outlier factors and their interdependence into a value-value graph-based representation.
The learned value outlierness allows for either direct outlier detection or outlying feature selection.
- Score: 43.51919113927003
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most of existing outlier detection methods assume that the outlier factors
(i.e., outlierness scoring measures) of data entities (e.g., feature values and
data objects) are Independent and Identically Distributed (IID). This
assumption does not hold in real-world applications where the outlierness of
different entities is dependent on each other and/or taken from different
probability distributions (non-IID). This may lead to the failure of detecting
important outliers that are too subtle to be identified without considering the
non-IID nature. The issue is even intensified in more challenging contexts,
e.g., high-dimensional data with many noisy features. This work introduces a
novel outlier detection framework and its two instances to identify outliers in
categorical data by capturing non-IID outlier factors. Our approach first
defines and incorporates distribution-sensitive outlier factors and their
interdependence into a value-value graph-based representation. It then models
an outlierness propagation process in the value graph to learn the outlierness
of feature values. The learned value outlierness allows for either direct
outlier detection or outlying feature selection. The graph representation and
mining approach is employed here to well capture the rich non-IID
characteristics. Our empirical results on 15 real-world data sets with
different levels of data complexities show that (i) the proposed outlier
detection methods significantly outperform five state-of-the-art methods at the
95%/99% confidence level, achieving 10%-28% AUC improvement on the 10 most
complex data sets; and (ii) the proposed feature selection methods
significantly outperform three competing methods in enabling subsequent outlier
detection of two different existing detectors.
Related papers
- RODEO: Robust Outlier Detection via Exposing Adaptive Out-of-Distribution Samples [4.76428036044684]
We introduce RODEO, a data-centric approach that generates effective outliers for robust outlier detection.
We show that incorporating outlier exposure (OE) and adversarial training can be an effective strategy for this purpose.
We demonstrate both quantitatively and qualitatively that our adaptive OE method effectively generates diverse'' and near-distribution'' outliers.
arXiv Detail & Related papers (2025-01-28T14:13:17Z) - Fuzzy Granule Density-Based Outlier Detection with Multi-Scale Granular Balls [65.44462297594308]
Outlier detection refers to the identification of anomalous samples that deviate significantly from the distribution of normal data.
Most unsupervised outlier detection methods are carefully designed to detect specified outliers.
We propose a fuzzy rough sets-based multi-scale outlier detection method to identify various types of outliers.
arXiv Detail & Related papers (2025-01-06T12:35:51Z) - Regularized Contrastive Partial Multi-view Outlier Detection [76.77036536484114]
We propose a novel method named Regularized Contrastive Partial Multi-view Outlier Detection (RCPMOD)
In this framework, we utilize contrastive learning to learn view-consistent information and distinguish outliers by the degree of consistency.
Experimental results on four benchmark datasets demonstrate that our proposed approach could outperform state-of-the-art competitors.
arXiv Detail & Related papers (2024-08-02T14:34:27Z) - Rethinking Unsupervised Outlier Detection via Multiple Thresholding [15.686139522490189]
We propose a multiple thresholding (Multi-T) module to advance existing scoring methods.
It generates two thresholds that isolate inliers and outliers from the unlabelled target dataset.
Experiments verify that Multi-T can significantly improve proposed outlier scoring methods.
arXiv Detail & Related papers (2024-07-07T14:09:50Z) - Robust Outlier Rejection for 3D Registration with Variational Bayes [70.98659381852787]
We develop a novel variational non-local network-based outlier rejection framework for robust alignment.
We propose a voting-based inlier searching strategy to cluster the high-quality hypothetical inliers for transformation estimation.
arXiv Detail & Related papers (2023-04-04T03:48:56Z) - Adaptive Negative Evidential Deep Learning for Open-set Semi-supervised Learning [69.81438976273866]
Open-set semi-supervised learning (Open-set SSL) considers a more practical scenario, where unlabeled data and test data contain new categories (outliers) not observed in labeled data (inliers)
We introduce evidential deep learning (EDL) as an outlier detector to quantify different types of uncertainty, and design different uncertainty metrics for self-training and inference.
We propose a novel adaptive negative optimization strategy, making EDL more tailored to the unlabeled dataset containing both inliers and outliers.
arXiv Detail & Related papers (2023-03-21T09:07:15Z) - Are we really making much progress in unsupervised graph outlier
detection? Revisiting the problem with new insight and superior method [36.72922385614812]
UNOD focuses on detecting two kinds of typical outliers in graphs: the structural outlier and the contextual outlier.
We find that the most widely-used outlier injection approach has a serious data leakage issue.
We propose a new framework, Variance-based Graph Outlier Detection (VGOD), which combines our variance-based model and attribute reconstruction model to detect outliers in a balanced way.
arXiv Detail & Related papers (2022-10-24T04:09:35Z) - Self-Trained One-class Classification for Unsupervised Anomaly Detection [56.35424872736276]
Anomaly detection (AD) has various applications across domains, from manufacturing to healthcare.
In this work, we focus on unsupervised AD problems whose entire training data are unlabeled and may contain both normal and anomalous samples.
To tackle this problem, we build a robust one-class classification framework via data refinement.
We show that our method outperforms state-of-the-art one-class classification method by 6.3 AUC and 12.5 average precision.
arXiv Detail & Related papers (2021-06-11T01:36:08Z) - Do We Really Need to Learn Representations from In-domain Data for
Outlier Detection? [6.445605125467574]
Methods based on the two-stage framework achieve state-of-the-art performance on this task.
We explore the possibility of avoiding the high cost of training a distinct representation for each outlier detection task.
In experiments, we demonstrate competitive or better performance on a variety of outlier detection benchmarks compared with previous two-stage methods.
arXiv Detail & Related papers (2021-05-19T17:30:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.