Outlier detection in mixed-attribute data: a semi-supervised approach with fuzzy approximations and relative entropy
- URL: http://arxiv.org/abs/2512.18978v1
- Date: Mon, 22 Dec 2025 02:41:43 GMT
- Title: Outlier detection in mixed-attribute data: a semi-supervised approach with fuzzy approximations and relative entropy
- Authors: Baiyang Chen, Zhong Yuan, Zheng Liu, Dezhong Peng, Yongxiang Li, Chang Liu, Guiduo Duan,
- Abstract summary: Outlier detection is a critical task in data mining, aimed at identifying objects that significantly deviate from the norm.<n>This paper introduces a semi-supervised outlier detection method, namely fuzzy rough sets-based outlier detection (FROD)<n> Experimental results on 16 public datasets show that FROD is comparable with or better than leading detection algorithms.
- Score: 44.721694491724406
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Outlier detection is a critical task in data mining, aimed at identifying objects that significantly deviate from the norm. Semi-supervised methods improve detection performance by leveraging partially labeled data but typically overlook the uncertainty and heterogeneity of real-world mixed-attribute data. This paper introduces a semi-supervised outlier detection method, namely fuzzy rough sets-based outlier detection (FROD), to effectively handle these challenges. Specifically, we first utilize a small subset of labeled data to construct fuzzy decision systems, through which we introduce the attribute classification accuracy based on fuzzy approximations to evaluate the contribution of attribute sets in outlier detection. Unlabeled data is then used to compute fuzzy relative entropy, which provides a characterization of outliers from the perspective of uncertainty. Finally, we develop the detection algorithm by combining attribute classification accuracy with fuzzy relative entropy. Experimental results on 16 public datasets show that FROD is comparable with or better than leading detection algorithms. All datasets and source codes are accessible at https://github.com/ChenBaiyang/FROD. This manuscript is the accepted author version of a paper published by Elsevier. The final published version is available at https://doi.org/10.1016/j.ijar.2025.109373
Related papers
- Consistency-guided semi-supervised outlier detection in heterogeneous data using fuzzy rough sets [45.9876416284051]
Outlier detection aims to find samples that behave differently from the majority of the data.<n>Semi-supervised detection methods can utilize the supervision of partial labels, thus reducing false positive rates.<n>We propose a consistency-guided outlier detection algorithm (COD) for heterogeneous data with the fuzzy rough set theory in a semi-supervised manner.
arXiv Detail & Related papers (2025-12-22T02:41:08Z) - Label-Informed Outlier Detection Based on Granule Density [43.94053430193935]
This paper introduces a label-informed outlier detection method for heterogeneous data based on Granular Computing and Fuzzy Sets.<n> Experimental results on various real-world datasets show that GDOF stands out in detecting outliers in heterogeneous data with a minimal number of labeled outliers.
arXiv Detail & Related papers (2025-12-21T15:27:06Z) - Kernel Representation and Similarity Measure for Incomplete Data [55.62595187178638]
Measuring similarity between incomplete data is a fundamental challenge in web mining, recommendation systems, and user behavior analysis.<n>Traditional approaches either discard incomplete data or perform imputation as a preprocessing step, leading to information loss and biased similarity estimates.<n>This paper presents a new similarity measure that directly computes similarity between incomplete data in kernel feature space without explicit imputation in the original space.
arXiv Detail & Related papers (2025-10-15T09:41:23Z) - Fuzzy Granule Density-Based Outlier Detection with Multi-Scale Granular Balls [65.44462297594308]
Outlier detection refers to the identification of anomalous samples that deviate significantly from the distribution of normal data.<n>Most unsupervised outlier detection methods are carefully designed to detect specified outliers.<n>We propose a fuzzy rough sets-based multi-scale outlier detection method to identify various types of outliers.
arXiv Detail & Related papers (2025-01-06T12:35:51Z) - CoMadOut -- A Robust Outlier Detection Algorithm based on CoMAD [0.3749861135832073]
Outliers play a significant role, since they bear the potential to distort the predictions of a machine learning algorithm on a given dataset.
To address this problem, we propose the robust outlier detection algorithm CoMadOut.
Our approach can be seen as a robust alternative for outlier detection tasks.
arXiv Detail & Related papers (2022-11-23T21:33:34Z) - Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D
Object Detection [85.11649974840758]
3D object detection networks tend to be biased towards the data they are trained on.
We propose a single-frame approach for source-free, unsupervised domain adaptation of lidar-based 3D object detectors.
arXiv Detail & Related papers (2021-11-30T18:42:42Z) - Self-Trained One-class Classification for Unsupervised Anomaly Detection [56.35424872736276]
Anomaly detection (AD) has various applications across domains, from manufacturing to healthcare.
In this work, we focus on unsupervised AD problems whose entire training data are unlabeled and may contain both normal and anomalous samples.
To tackle this problem, we build a robust one-class classification framework via data refinement.
We show that our method outperforms state-of-the-art one-class classification method by 6.3 AUC and 12.5 average precision.
arXiv Detail & Related papers (2021-06-11T01:36:08Z) - Homophily Outlier Detection in Non-IID Categorical Data [43.51919113927003]
This work introduces a novel outlier detection framework and its two instances to identify outliers in categorical data.
It first defines and incorporates distribution-sensitive outlier factors and their interdependence into a value-value graph-based representation.
The learned value outlierness allows for either direct outlier detection or outlying feature selection.
arXiv Detail & Related papers (2021-03-21T23:29:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.