A Supervised Feature Selection Method For Mixed-Type Data using
Density-based Feature Clustering
- URL: http://arxiv.org/abs/2111.08169v1
- Date: Wed, 10 Nov 2021 15:05:15 GMT
- Title: A Supervised Feature Selection Method For Mixed-Type Data using
Density-based Feature Clustering
- Authors: Xuyang Yan, Mrinmoy Sarkar, Biniam Gebru, Shabnam Nazmi, and Abdollah
Homaifar
- Abstract summary: This paper proposes a supervised feature selection method using density-based feature clustering (SFSDFC)
SFSDFC decomposes the feature space into a set of disjoint feature clusters using a novel density-based clustering method.
Then, an effective feature selection strategy is employed to obtain a subset of important features with minimal redundancy from those feature clusters.
- Score: 1.3048920509133808
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Feature selection methods are widely used to address the high computational
overheads and curse of dimensionality in classifying high-dimensional data.
Most conventional feature selection methods focus on handling homogeneous
features, while real-world datasets usually have a mixture of continuous and
discrete features. Some recent mixed-type feature selection studies only select
features with high relevance to class labels and ignore the redundancy among
features. The determination of an appropriate feature subset is also a
challenge. In this paper, a supervised feature selection method using
density-based feature clustering (SFSDFC) is proposed to obtain an appropriate
final feature subset for mixed-type data. SFSDFC decomposes the feature space
into a set of disjoint feature clusters using a novel density-based clustering
method. Then, an effective feature selection strategy is employed to obtain a
subset of important features with minimal redundancy from those feature
clusters. Extensive experiments as well as comparison studies with five
state-of-the-art methods are conducted on SFSDFC using thirteen real-world
benchmark datasets and results justify the efficacy of the SFSDFC method.
Related papers
- Feature Selection as Deep Sequential Generative Learning [50.00973409680637]
We develop a deep variational transformer model over a joint of sequential reconstruction, variational, and performance evaluator losses.
Our model can distill feature selection knowledge and learn a continuous embedding space to map feature selection decision sequences into embedding vectors associated with utility scores.
arXiv Detail & Related papers (2024-03-06T16:31:56Z) - Graph-based Extreme Feature Selection for Multi-class Classification
Tasks [7.863638253070439]
This work focuses on a graph-based, filter feature selection method that is suited for multi-class classifications tasks.
We aim to drastically reduce the number of selected features, in order to create a sketch of the original data that codes valuable information for the classification task.
arXiv Detail & Related papers (2023-03-03T09:06:35Z) - ManiFeSt: Manifold-based Feature Selection for Small Data Sets [9.649457851261909]
We present a new method for few-sample supervised feature selection (FS)
Our method first learns the manifold of the feature space of each class using kernels capturing multi-feature associations.
We show that our FS leads to improved classification and better generalization when applied to test data.
arXiv Detail & Related papers (2022-07-18T12:58:01Z) - Parallel feature selection based on the trace ratio criterion [4.30274561163157]
This work presents a novel parallel feature selection approach for classification, namely Parallel Feature Selection using Trace criterion (PFST)
Our method uses trace criterion, a measure of class separability used in Fisher's Discriminant Analysis, to evaluate feature usefulness.
The experiments show that our method can produce a small set of features in a fraction of the amount of time by the other methods under comparison.
arXiv Detail & Related papers (2022-03-03T10:50:33Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Automated Supervised Feature Selection for Differentiated Patterns of
Care [5.3825788156200565]
The pipeline included three types of feature selection techniques; Filters, Wrappers and Embedded methods to select the top K features.
The selected features were tested in the existing multi-dimensional subset scanning (MDSS) where the most anomalous subpopulations, most anomalous subsets, propensity scores, and effect of measures were recorded to test their performance.
arXiv Detail & Related papers (2021-11-05T13:27:18Z) - Multivariate feature ranking of gene expression data [62.997667081978825]
We propose two new multivariate feature ranking methods based on pairwise correlation and pairwise consistency.
We statistically prove that the proposed methods outperform the state of the art feature ranking methods Clustering Variation, Chi Squared, Correlation, Information Gain, ReliefF and Significance.
arXiv Detail & Related papers (2021-11-03T17:19:53Z) - Channel DropBlock: An Improved Regularization Method for Fine-Grained
Visual Classification [58.07257910065007]
Existing approaches mainly tackle this problem by introducing attention mechanisms to locate the discriminative parts or feature encoding approaches to extract the highly parameterized features in a weakly-supervised fashion.
In this work, we propose a lightweight yet effective regularization method named Channel DropBlock (CDB) in combination with two alternative correlation metrics, to address this problem.
arXiv Detail & Related papers (2021-06-07T09:03:02Z) - Enhanced Classification Accuracy for Cardiotocogram Data with Ensemble
Feature Selection and Classifier Ensemble [0.0]
The proposed approach consists of two phases: (i) to select feature sets that are likely to be the support vectors by applying ensemble based feature selection methods; and (ii) to construct an SVM ensemble using the selected features.
Four feature selection techniques were used: (i) Correlation-based, (ii) Consistency-based, (iii) ReliefF and (iv) Information Gain.
arXiv Detail & Related papers (2020-10-27T05:11:24Z) - Stable and consistent density-based clustering via multiparameter
persistence [77.34726150561087]
We consider the degree-Rips construction from topological data analysis.
We analyze its stability to perturbations of the input data using the correspondence-interleaving distance.
We integrate these methods into a pipeline for density-based clustering, which we call Persistable.
arXiv Detail & Related papers (2020-05-18T19:45:04Z) - New advances in enumerative biclustering algorithms with online
partitioning [80.22629846165306]
This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets.
The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime.
arXiv Detail & Related papers (2020-03-07T14:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.