Granular conditional entropy-based attribute reduction for partially
labeled data with proxy labels
- URL: http://arxiv.org/abs/2101.09495v1
- Date: Sat, 23 Jan 2021 12:50:09 GMT
- Title: Granular conditional entropy-based attribute reduction for partially
labeled data with proxy labels
- Authors: Can Gao, Jie Zhoua, Duoqian Miao, Xiaodong Yue, Jun Wan
- Abstract summary: We propose a rough sets-based semi-supervised attribute reduction method for partially labeled data.
A novel conditional entropy measure is proposed, and its monotonicity is proved in theory.
Experiments conducted on UCI data sets demonstrate that the proposed semi-supervised attribute reduction method is promising.
- Score: 12.755874217721054
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Attribute reduction is one of the most important research topics in the
theory of rough sets, and many rough sets-based attribute reduction methods
have thus been presented. However, most of them are specifically designed for
dealing with either labeled data or unlabeled data, while many real-world
applications come in the form of partial supervision. In this paper, we propose
a rough sets-based semi-supervised attribute reduction method for partially
labeled data. Particularly, with the aid of prior class distribution
information about data, we first develop a simple yet effective strategy to
produce the proxy labels for unlabeled data. Then the concept of information
granularity is integrated into the information-theoretic measure, based on
which, a novel granular conditional entropy measure is proposed, and its
monotonicity is proved in theory. Furthermore, a fast heuristic algorithm is
provided to generate the optimal reduct of partially labeled data, which could
accelerate the process of attribute reduction by removing irrelevant examples
and excluding redundant attributes simultaneously. Extensive experiments
conducted on UCI data sets demonstrate that the proposed semi-supervised
attribute reduction method is promising and even compares favourably with the
supervised methods on labeled data and unlabeled data with true labels in terms
of classification performance.
Related papers
- Causal Effect Regularization: Automated Detection and Removal of
Spurious Attributes [13.852987916253685]
In many classification datasets, the task labels are spuriously correlated with some input attributes.
We propose a method to automatically identify spurious attributes by estimating their causal effect on the label.
Our method mitigates the reliance on spurious attributes even under noisy estimation of causal effects.
arXiv Detail & Related papers (2023-06-19T17:17:42Z) - Mitigating Algorithmic Bias with Limited Annotations [65.060639928772]
When sensitive attributes are not disclosed or available, it is needed to manually annotate a small part of the training data to mitigate bias.
We propose Active Penalization Of Discrimination (APOD), an interactive framework to guide the limited annotations towards maximally eliminating the effect of algorithmic bias.
APOD shows comparable performance to fully annotated bias mitigation, which demonstrates that APOD could benefit real-world applications when sensitive information is limited.
arXiv Detail & Related papers (2022-07-20T16:31:19Z) - Reprint: a randomized extrapolation based on principal components for
data augmentation [11.449992652644577]
This paper presents a simple and effective hidden-space data augmentation method for imbalanced data classification.
Given hidden-space representations of samples in each class, REPRINT extrapolates, in a randomized fashion, augmented examples for target class.
This method involves a label refinement component which allows to synthesize new soft labels for augmented examples.
arXiv Detail & Related papers (2022-04-26T01:38:47Z) - Self-Trained One-class Classification for Unsupervised Anomaly Detection [56.35424872736276]
Anomaly detection (AD) has various applications across domains, from manufacturing to healthcare.
In this work, we focus on unsupervised AD problems whose entire training data are unlabeled and may contain both normal and anomalous samples.
To tackle this problem, we build a robust one-class classification framework via data refinement.
We show that our method outperforms state-of-the-art one-class classification method by 6.3 AUC and 12.5 average precision.
arXiv Detail & Related papers (2021-06-11T01:36:08Z) - A Novel Perspective for Positive-Unlabeled Learning via Noisy Labels [49.990938653249415]
This research presents a methodology that assigns initial pseudo-labels to unlabeled data which is used as noisy-labeled data, and trains a deep neural network using the noisy-labeled data.
Experimental results demonstrate that the proposed method significantly outperforms the state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2021-03-08T11:46:02Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Gradient Descent in RKHS with Importance Labeling [58.79085525115987]
We study importance labeling problem, in which we are given many unlabeled data.
We propose a new importance labeling scheme that can effectively select an informative subset of unlabeled data.
arXiv Detail & Related papers (2020-06-19T01:55:00Z) - Supervised Visualization for Data Exploration [9.742277703732187]
We describe a novel supervised visualization technique based on random forest proximities and diffusion-based dimensionality reduction.
Our approach is robust to noise and parameter tuning, thus making it simple to use while producing reliable visualizations for data exploration.
arXiv Detail & Related papers (2020-06-15T19:10:17Z) - Data Augmentation Imbalance For Imbalanced Attribute Classification [60.71438625139922]
We propose a new re-sampling algorithm called: data augmentation imbalance (DAI) to explicitly enhance the ability to discriminate the fewer attributes.
Our DAI algorithm achieves state-of-the-art results, based on pedestrian attribute datasets.
arXiv Detail & Related papers (2020-04-19T20:43:29Z) - Clustering and Classification with Non-Existence Attributes: A Sentenced
Discrepancy Measure Based Technique [0.0]
Clustering approaches cannot be applied directly to such data unless pre-processing by techniques like imputation or marginalization.
We have overcome this drawback by utilizing a Sentenced Discrepancy Measure which we refer to as the Attribute Weighted Penalty based Discrepancy (AWPD)
This technique is designed to trace invaluable data to: directly apply our method on the datasets which have Non-Existence attributes and establish a method for detecting unstructured Non-Existence attributes with the best accuracy rate and minimum cost.
arXiv Detail & Related papers (2020-02-24T17:56:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.