A Supervised Feature Selection Method For Mixed-Type Data using
Density-based Feature Clustering
- URL: http://arxiv.org/abs/2111.08169v1
- Date: Wed, 10 Nov 2021 15:05:15 GMT
- Title: A Supervised Feature Selection Method For Mixed-Type Data using
Density-based Feature Clustering
- Authors: Xuyang Yan, Mrinmoy Sarkar, Biniam Gebru, Shabnam Nazmi, and Abdollah
Homaifar
- Abstract summary: This paper proposes a supervised feature selection method using density-based feature clustering (SFSDFC)
SFSDFC decomposes the feature space into a set of disjoint feature clusters using a novel density-based clustering method.
Then, an effective feature selection strategy is employed to obtain a subset of important features with minimal redundancy from those feature clusters.
- Score: 1.3048920509133808
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Feature selection methods are widely used to address the high computational
overheads and curse of dimensionality in classifying high-dimensional data.
Most conventional feature selection methods focus on handling homogeneous
features, while real-world datasets usually have a mixture of continuous and
discrete features. Some recent mixed-type feature selection studies only select
features with high relevance to class labels and ignore the redundancy among
features. The determination of an appropriate feature subset is also a
challenge. In this paper, a supervised feature selection method using
density-based feature clustering (SFSDFC) is proposed to obtain an appropriate
final feature subset for mixed-type data. SFSDFC decomposes the feature space
into a set of disjoint feature clusters using a novel density-based clustering
method. Then, an effective feature selection strategy is employed to obtain a
subset of important features with minimal redundancy from those feature
clusters. Extensive experiments as well as comparison studies with five
state-of-the-art methods are conducted on SFSDFC using thirteen real-world
benchmark datasets and results justify the efficacy of the SFSDFC method.
Related papers
- Permutation-based multi-objective evolutionary feature selection for high-dimensional data [43.18726655647964]
We propose a novel feature selection method for high-dimensional data, based on the well-known permutation feature importance approach.
The proposed method employs a multi-objective evolutionary algorithm to search for candidate feature subsets.
The effectiveness of our method has been validated on a set of 24 publicly available high-dimensional datasets.
arXiv Detail & Related papers (2025-01-24T08:11:28Z) - Feature Selection for Latent Factor Models [2.07180164747172]
Feature selection is crucial for pinpointing relevant features in high-dimensional datasets.
Traditional feature selection methods for classification use data from all classes to select features for each class.
This paper explores feature selection methods that select features for each class separately, using class models based on low-rank generative methods.
arXiv Detail & Related papers (2024-12-13T13:20:10Z) - Collaborative Feature-Logits Contrastive Learning for Open-Set Semi-Supervised Object Detection [75.02249869573994]
In open-set scenarios, the unlabeled dataset contains both in-distribution (ID) classes and out-of-distribution (OOD) classes.
Applying semi-supervised detectors in such settings can lead to misclassifying OOD class as ID classes.
We propose a simple yet effective method, termed Collaborative Feature-Logits Detector (CFL-Detector)
arXiv Detail & Related papers (2024-11-20T02:57:35Z) - Feature Selection as Deep Sequential Generative Learning [50.00973409680637]
We develop a deep variational transformer model over a joint of sequential reconstruction, variational, and performance evaluator losses.
Our model can distill feature selection knowledge and learn a continuous embedding space to map feature selection decision sequences into embedding vectors associated with utility scores.
arXiv Detail & Related papers (2024-03-06T16:31:56Z) - Graph-based Extreme Feature Selection for Multi-class Classification
Tasks [7.863638253070439]
This work focuses on a graph-based, filter feature selection method that is suited for multi-class classifications tasks.
We aim to drastically reduce the number of selected features, in order to create a sketch of the original data that codes valuable information for the classification task.
arXiv Detail & Related papers (2023-03-03T09:06:35Z) - Parallel feature selection based on the trace ratio criterion [4.30274561163157]
This work presents a novel parallel feature selection approach for classification, namely Parallel Feature Selection using Trace criterion (PFST)
Our method uses trace criterion, a measure of class separability used in Fisher's Discriminant Analysis, to evaluate feature usefulness.
The experiments show that our method can produce a small set of features in a fraction of the amount of time by the other methods under comparison.
arXiv Detail & Related papers (2022-03-03T10:50:33Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Multivariate feature ranking of gene expression data [62.997667081978825]
We propose two new multivariate feature ranking methods based on pairwise correlation and pairwise consistency.
We statistically prove that the proposed methods outperform the state of the art feature ranking methods Clustering Variation, Chi Squared, Correlation, Information Gain, ReliefF and Significance.
arXiv Detail & Related papers (2021-11-03T17:19:53Z) - Channel DropBlock: An Improved Regularization Method for Fine-Grained
Visual Classification [58.07257910065007]
Existing approaches mainly tackle this problem by introducing attention mechanisms to locate the discriminative parts or feature encoding approaches to extract the highly parameterized features in a weakly-supervised fashion.
In this work, we propose a lightweight yet effective regularization method named Channel DropBlock (CDB) in combination with two alternative correlation metrics, to address this problem.
arXiv Detail & Related papers (2021-06-07T09:03:02Z) - Enhanced Classification Accuracy for Cardiotocogram Data with Ensemble
Feature Selection and Classifier Ensemble [0.0]
The proposed approach consists of two phases: (i) to select feature sets that are likely to be the support vectors by applying ensemble based feature selection methods; and (ii) to construct an SVM ensemble using the selected features.
Four feature selection techniques were used: (i) Correlation-based, (ii) Consistency-based, (iii) ReliefF and (iv) Information Gain.
arXiv Detail & Related papers (2020-10-27T05:11:24Z) - New advances in enumerative biclustering algorithms with online
partitioning [80.22629846165306]
This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets.
The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime.
arXiv Detail & Related papers (2020-03-07T14:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.