Label-Informed Outlier Detection Based on Granule Density
- URL: http://arxiv.org/abs/2512.18774v1
- Date: Sun, 21 Dec 2025 15:27:06 GMT
- Title: Label-Informed Outlier Detection Based on Granule Density
- Authors: Baiyang Chen, Zhong Yuan, Dezhong Peng, Hongmei Chen, Xiaomin Song, Huiming Zheng,
- Abstract summary: This paper introduces a label-informed outlier detection method for heterogeneous data based on Granular Computing and Fuzzy Sets.<n> Experimental results on various real-world datasets show that GDOF stands out in detecting outliers in heterogeneous data with a minimal number of labeled outliers.
- Score: 43.94053430193935
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Outlier detection, crucial for identifying unusual patterns with significant implications across numerous applications, has drawn considerable research interest. Existing semi-supervised methods typically treat data as purely numerical and} in a deterministic manner, thereby neglecting the heterogeneity and uncertainty inherent in complex, real-world datasets. This paper introduces a label-informed outlier detection method for heterogeneous data based on Granular Computing and Fuzzy Sets, namely Granule Density-based Outlier Factor (GDOF). Specifically, GDOF first employs label-informed fuzzy granulation to effectively represent various data types and develops granule density for precise density estimation. Subsequently, granule densities from individual attributes are integrated for outlier scoring by assessing attribute relevance with a limited number of labeled outliers. Experimental results on various real-world datasets show that GDOF stands out in detecting outliers in heterogeneous data with a minimal number of labeled outliers. The integration of Fuzzy Sets and Granular Computing in GDOF offers a practical framework for outlier detection in complex and diverse data types. All relevant datasets and source codes are publicly available for further research. This is the author's accepted manuscript of a paper published in IEEE Transactions on Fuzzy Systems. The final version is available at https://doi.org/10.1109/TFUZZ.2024.3514853
Related papers
- Outlier detection in mixed-attribute data: a semi-supervised approach with fuzzy approximations and relative entropy [44.721694491724406]
Outlier detection is a critical task in data mining, aimed at identifying objects that significantly deviate from the norm.<n>This paper introduces a semi-supervised outlier detection method, namely fuzzy rough sets-based outlier detection (FROD)<n> Experimental results on 16 public datasets show that FROD is comparable with or better than leading detection algorithms.
arXiv Detail & Related papers (2025-12-22T02:41:43Z) - Consistency-guided semi-supervised outlier detection in heterogeneous data using fuzzy rough sets [45.9876416284051]
Outlier detection aims to find samples that behave differently from the majority of the data.<n>Semi-supervised detection methods can utilize the supervision of partial labels, thus reducing false positive rates.<n>We propose a consistency-guided outlier detection algorithm (COD) for heterogeneous data with the fuzzy rough set theory in a semi-supervised manner.
arXiv Detail & Related papers (2025-12-22T02:41:08Z) - Fuzzy Granule Density-Based Outlier Detection with Multi-Scale Granular Balls [65.44462297594308]
Outlier detection refers to the identification of anomalous samples that deviate significantly from the distribution of normal data.<n>Most unsupervised outlier detection methods are carefully designed to detect specified outliers.<n>We propose a fuzzy rough sets-based multi-scale outlier detection method to identify various types of outliers.
arXiv Detail & Related papers (2025-01-06T12:35:51Z) - Conditional Semi-Supervised Data Augmentation for Spam Message Detection with Low Resource Data [0.0]
We propose a conditional semi-supervised data augmentation for a spam detection model lacking the availability of data.
We exploit unlabeled data for data augmentation to extend training data.
Latent variables can come from labeled and unlabeled data as the input for the final classifier.
arXiv Detail & Related papers (2024-07-06T07:51:24Z) - Self-Trained One-class Classification for Unsupervised Anomaly Detection [56.35424872736276]
Anomaly detection (AD) has various applications across domains, from manufacturing to healthcare.
In this work, we focus on unsupervised AD problems whose entire training data are unlabeled and may contain both normal and anomalous samples.
To tackle this problem, we build a robust one-class classification framework via data refinement.
We show that our method outperforms state-of-the-art one-class classification method by 6.3 AUC and 12.5 average precision.
arXiv Detail & Related papers (2021-06-11T01:36:08Z) - Homophily Outlier Detection in Non-IID Categorical Data [43.51919113927003]
This work introduces a novel outlier detection framework and its two instances to identify outliers in categorical data.
It first defines and incorporates distribution-sensitive outlier factors and their interdependence into a value-value graph-based representation.
The learned value outlierness allows for either direct outlier detection or outlying feature selection.
arXiv Detail & Related papers (2021-03-21T23:29:33Z) - Self-training Avoids Using Spurious Features Under Domain Shift [54.794607791641745]
In unsupervised domain adaptation, conditional entropy minimization and pseudo-labeling work even when the domain shifts are much larger than those analyzed by existing theory.
We identify and analyze one particular setting where the domain shift can be large, but certain spurious features correlate with label in the source domain but are independent label in the target.
arXiv Detail & Related papers (2020-06-17T17:51:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.