Related papers: Homophily Outlier Detection in Non-IID Categorical Data

Homophily Outlier Detection in Non-IID Categorical Data

URL: http://arxiv.org/abs/2103.11516v1
Date: Sun, 21 Mar 2021 23:29:33 GMT
Title: Homophily Outlier Detection in Non-IID Categorical Data
Authors: Guansong Pang, Longbing Cao, Ling Chen
Abstract summary: This work introduces a novel outlier detection framework and its two instances to identify outliers in categorical data. It first defines and incorporates distribution-sensitive outlier factors and their interdependence into a value-value graph-based representation. The learned value outlierness allows for either direct outlier detection or outlying feature selection.
Score: 43.51919113927003
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Most of existing outlier detection methods assume that the outlier factors (i.e., outlierness scoring measures) of data entities (e.g., feature values and data objects) are Independent and Identically Distributed (IID). This assumption does not hold in real-world applications where the outlierness of different entities is dependent on each other and/or taken from different probability distributions (non-IID). This may lead to the failure of detecting important outliers that are too subtle to be identified without considering the non-IID nature. The issue is even intensified in more challenging contexts, e.g., high-dimensional data with many noisy features. This work introduces a novel outlier detection framework and its two instances to identify outliers in categorical data by capturing non-IID outlier factors. Our approach first defines and incorporates distribution-sensitive outlier factors and their interdependence into a value-value graph-based representation. It then models an outlierness propagation process in the value graph to learn the outlierness of feature values. The learned value outlierness allows for either direct outlier detection or outlying feature selection. The graph representation and mining approach is employed here to well capture the rich non-IID characteristics. Our empirical results on 15 real-world data sets with different levels of data complexities show that (i) the proposed outlier detection methods significantly outperform five state-of-the-art methods at the 95%/99% confidence level, achieving 10%-28% AUC improvement on the 10 most complex data sets; and (ii) the proposed feature selection methods significantly outperform three competing methods in enabling subsequent outlier detection of two different existing detectors.

Related papers

RODEO: Robust Outlier Detection via Exposing Adaptive Out-of-Distribution Samples [4.76428036044684]
We introduce RODEO, a data-centric approach that generates effective outliers for robust outlier detection. We show that incorporating outlier exposure (OE) and adversarial training can be an effective strategy for this purpose. We demonstrate both quantitatively and qualitatively that our adaptive OE method effectively generates diverse'' and near-distribution'' outliers.
arXiv Detail & Related papers (2025-01-28T14:13:17Z)
Fuzzy Granule Density-Based Outlier Detection with Multi-Scale Granular Balls [65.44462297594308]
Outlier detection refers to the identification of anomalous samples that deviate significantly from the distribution of normal data. Most unsupervised outlier detection methods are carefully designed to detect specified outliers. We propose a fuzzy rough sets-based multi-scale outlier detection method to identify various types of outliers.
arXiv Detail & Related papers (2025-01-06T12:35:51Z)
Regularized Contrastive Partial Multi-view Outlier Detection [76.77036536484114]
We propose a novel method named Regularized Contrastive Partial Multi-view Outlier Detection (RCPMOD) In this framework, we utilize contrastive learning to learn view-consistent information and distinguish outliers by the degree of consistency. Experimental results on four benchmark datasets demonstrate that our proposed approach could outperform state-of-the-art competitors.
arXiv Detail & Related papers (2024-08-02T14:34:27Z)
Rethinking Unsupervised Outlier Detection via Multiple Thresholding [15.686139522490189]
We propose a multiple thresholding (Multi-T) module to advance existing scoring methods. It generates two thresholds that isolate inliers and outliers from the unlabelled target dataset. Experiments verify that Multi-T can significantly improve proposed outlier scoring methods.
arXiv Detail & Related papers (2024-07-07T14:09:50Z)
Robust Outlier Rejection for 3D Registration with Variational Bayes [70.98659381852787]
We develop a novel variational non-local network-based outlier rejection framework for robust alignment. We propose a voting-based inlier searching strategy to cluster the high-quality hypothetical inliers for transformation estimation.
arXiv Detail & Related papers (2023-04-04T03:48:56Z)
Adaptive Negative Evidential Deep Learning for Open-set Semi-supervised Learning [69.81438976273866]
Open-set semi-supervised learning (Open-set SSL) considers a more practical scenario, where unlabeled data and test data contain new categories (outliers) not observed in labeled data (inliers) We introduce evidential deep learning (EDL) as an outlier detector to quantify different types of uncertainty, and design different uncertainty metrics for self-training and inference. We propose a novel adaptive negative optimization strategy, making EDL more tailored to the unlabeled dataset containing both inliers and outliers.
arXiv Detail & Related papers (2023-03-21T09:07:15Z)
Are we really making much progress in unsupervised graph outlier detection? Revisiting the problem with new insight and superior method [36.72922385614812]
UNOD focuses on detecting two kinds of typical outliers in graphs: the structural outlier and the contextual outlier. We find that the most widely-used outlier injection approach has a serious data leakage issue. We propose a new framework, Variance-based Graph Outlier Detection (VGOD), which combines our variance-based model and attribute reconstruction model to detect outliers in a balanced way.
arXiv Detail & Related papers (2022-10-24T04:09:35Z)
Unsupervised Outlier Detection using Memory and Contrastive Learning [53.77693158251706]
We think outlier detection can be done in the feature space by measuring the feature distance between outliers and inliers. We propose a framework, MCOD, using a memory module and a contrastive learning module. Our proposed MCOD achieves a considerable performance and outperforms nine state-of-the-art methods.
arXiv Detail & Related papers (2021-07-27T07:35:42Z)
Self-Trained One-class Classification for Unsupervised Anomaly Detection [56.35424872736276]
Anomaly detection (AD) has various applications across domains, from manufacturing to healthcare. In this work, we focus on unsupervised AD problems whose entire training data are unlabeled and may contain both normal and anomalous samples. To tackle this problem, we build a robust one-class classification framework via data refinement. We show that our method outperforms state-of-the-art one-class classification method by 6.3 AUC and 12.5 average precision.
arXiv Detail & Related papers (2021-06-11T01:36:08Z)
Do We Really Need to Learn Representations from In-domain Data for Outlier Detection? [6.445605125467574]
Methods based on the two-stage framework achieve state-of-the-art performance on this task. We explore the possibility of avoiding the high cost of training a distinct representation for each outlier detection task. In experiments, we demonstrate competitive or better performance on a variety of outlier detection benchmarks compared with previous two-stage methods.
arXiv Detail & Related papers (2021-05-19T17:30:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.