Related papers: A Supervised Feature Selection Method For Mixed-Type Data using Density-based Feature Clustering

A Supervised Feature Selection Method For Mixed-Type Data using Density-based Feature Clustering

URL: http://arxiv.org/abs/2111.08169v1
Date: Wed, 10 Nov 2021 15:05:15 GMT
Title: A Supervised Feature Selection Method For Mixed-Type Data using Density-based Feature Clustering
Authors: Xuyang Yan, Mrinmoy Sarkar, Biniam Gebru, Shabnam Nazmi, and Abdollah Homaifar
Abstract summary: This paper proposes a supervised feature selection method using density-based feature clustering (SFSDFC) SFSDFC decomposes the feature space into a set of disjoint feature clusters using a novel density-based clustering method. Then, an effective feature selection strategy is employed to obtain a subset of important features with minimal redundancy from those feature clusters.
Score: 1.3048920509133808
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Feature selection methods are widely used to address the high computational overheads and curse of dimensionality in classifying high-dimensional data. Most conventional feature selection methods focus on handling homogeneous features, while real-world datasets usually have a mixture of continuous and discrete features. Some recent mixed-type feature selection studies only select features with high relevance to class labels and ignore the redundancy among features. The determination of an appropriate feature subset is also a challenge. In this paper, a supervised feature selection method using density-based feature clustering (SFSDFC) is proposed to obtain an appropriate final feature subset for mixed-type data. SFSDFC decomposes the feature space into a set of disjoint feature clusters using a novel density-based clustering method. Then, an effective feature selection strategy is employed to obtain a subset of important features with minimal redundancy from those feature clusters. Extensive experiments as well as comparison studies with five state-of-the-art methods are conducted on SFSDFC using thirteen real-world benchmark datasets and results justify the efficacy of the SFSDFC method.

Related papers

GOLFS: Feature Selection via Combining Both Global and Local Information for High Dimensional Clustering [10.740524877905685]
We propose a new unsupervised feature selection method, named GlObal and Local information combined Feature Selection (GOLFS)<n>GOLFS combines both local geometric structure via manifold learning and global correlation structure of samples to select the discriminative features.<n>The combination improves the accuracy of both feature selection and clustering by exploiting more comprehensive information.
arXiv Detail & Related papers (2025-07-15T03:39:07Z)
Permutation-based multi-objective evolutionary feature selection for high-dimensional data [43.18726655647964]
We propose a novel feature selection method for high-dimensional data, based on the well-known permutation feature importance approach. The proposed method employs a multi-objective evolutionary algorithm to search for candidate feature subsets. The effectiveness of our method has been validated on a set of 24 publicly available high-dimensional datasets.
arXiv Detail & Related papers (2025-01-24T08:11:28Z)
Feature Selection for Latent Factor Models [2.07180164747172]
Feature selection is crucial for pinpointing relevant features in high-dimensional datasets. Traditional feature selection methods for classification use data from all classes to select features for each class. This paper explores feature selection methods that select features for each class separately, using class models based on low-rank generative methods.
arXiv Detail & Related papers (2024-12-13T13:20:10Z)
Collaborative Feature-Logits Contrastive Learning for Open-Set Semi-Supervised Object Detection [75.02249869573994]
In open-set scenarios, the unlabeled dataset contains both in-distribution (ID) classes and out-of-distribution (OOD) classes. Applying semi-supervised detectors in such settings can lead to misclassifying OOD class as ID classes. We propose a simple yet effective method, termed Collaborative Feature-Logits Detector (CFL-Detector)
arXiv Detail & Related papers (2024-11-20T02:57:35Z)
Feature Selection as Deep Sequential Generative Learning [50.00973409680637]
We develop a deep variational transformer model over a joint of sequential reconstruction, variational, and performance evaluator losses. Our model can distill feature selection knowledge and learn a continuous embedding space to map feature selection decision sequences into embedding vectors associated with utility scores.
arXiv Detail & Related papers (2024-03-06T16:31:56Z)
Graph-based Extreme Feature Selection for Multi-class Classification Tasks [7.863638253070439]
This work focuses on a graph-based, filter feature selection method that is suited for multi-class classifications tasks. We aim to drastically reduce the number of selected features, in order to create a sketch of the original data that codes valuable information for the classification task.
arXiv Detail & Related papers (2023-03-03T09:06:35Z)
ManiFeSt: Manifold-based Feature Selection for Small Data Sets [9.649457851261909]
We present a new method for few-sample supervised feature selection (FS) Our method first learns the manifold of the feature space of each class using kernels capturing multi-feature associations. We show that our FS leads to improved classification and better generalization when applied to test data.
arXiv Detail & Related papers (2022-07-18T12:58:01Z)
Parallel feature selection based on the trace ratio criterion [4.30274561163157]
This work presents a novel parallel feature selection approach for classification, namely Parallel Feature Selection using Trace criterion (PFST) Our method uses trace criterion, a measure of class separability used in Fisher's Discriminant Analysis, to evaluate feature usefulness. The experiments show that our method can produce a small set of features in a fraction of the amount of time by the other methods under comparison.
arXiv Detail & Related papers (2022-03-03T10:50:33Z)
Compactness Score: A Fast Filter Method for Unsupervised Feature Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features. Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z)
Automated Supervised Feature Selection for Differentiated Patterns of Care [5.3825788156200565]
The pipeline included three types of feature selection techniques; Filters, Wrappers and Embedded methods to select the top K features. The selected features were tested in the existing multi-dimensional subset scanning (MDSS) where the most anomalous subpopulations, most anomalous subsets, propensity scores, and effect of measures were recorded to test their performance.
arXiv Detail & Related papers (2021-11-05T13:27:18Z)
Multivariate feature ranking of gene expression data [62.997667081978825]
We propose two new multivariate feature ranking methods based on pairwise correlation and pairwise consistency. We statistically prove that the proposed methods outperform the state of the art feature ranking methods Clustering Variation, Chi Squared, Correlation, Information Gain, ReliefF and Significance.
arXiv Detail & Related papers (2021-11-03T17:19:53Z)
Channel DropBlock: An Improved Regularization Method for Fine-Grained Visual Classification [58.07257910065007]
Existing approaches mainly tackle this problem by introducing attention mechanisms to locate the discriminative parts or feature encoding approaches to extract the highly parameterized features in a weakly-supervised fashion. In this work, we propose a lightweight yet effective regularization method named Channel DropBlock (CDB) in combination with two alternative correlation metrics, to address this problem.
arXiv Detail & Related papers (2021-06-07T09:03:02Z)
Enhanced Classification Accuracy for Cardiotocogram Data with Ensemble Feature Selection and Classifier Ensemble [0.0]
The proposed approach consists of two phases: (i) to select feature sets that are likely to be the support vectors by applying ensemble based feature selection methods; and (ii) to construct an SVM ensemble using the selected features. Four feature selection techniques were used: (i) Correlation-based, (ii) Consistency-based, (iii) ReliefF and (iv) Information Gain.
arXiv Detail & Related papers (2020-10-27T05:11:24Z)
New advances in enumerative biclustering algorithms with online partitioning [80.22629846165306]
This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets. The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime.
arXiv Detail & Related papers (2020-03-07T14:54:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.