A Novel Community Detection Based Genetic Algorithm for Feature
Selection
- URL: http://arxiv.org/abs/2008.03543v1
- Date: Sat, 8 Aug 2020 15:39:30 GMT
- Title: A Novel Community Detection Based Genetic Algorithm for Feature
Selection
- Authors: Mehrdad Rostami, Kamal Berahmand, Saman Forouzandeh
- Abstract summary: Authors propose a genetic algorithm based on community detection, which functions in three steps.
Nine benchmark classification problems were analyzed in terms of the performance of the presented approach.
- Score: 3.8848561367220276
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The selection of features is an essential data preprocessing stage in data
mining. The core principle of feature selection seems to be to pick a subset of
possible features by excluding features with almost no predictive information
as well as highly associated redundant features. In the past several years, a
variety of meta-heuristic methods were introduced to eliminate redundant and
irrelevant features as much as possible from high-dimensional datasets. Among
the main disadvantages of present meta-heuristic based approaches is that they
are often neglecting the correlation between a set of selected features. In
this article, for the purpose of feature selection, the authors propose a
genetic algorithm based on community detection, which functions in three steps.
The feature similarities are calculated in the first step. The features are
classified by community detection algorithms into clusters throughout the
second step. In the third step, features are picked by a genetic algorithm with
a new community-based repair operation. Nine benchmark classification problems
were analyzed in terms of the performance of the presented approach. Also, the
authors have compared the efficiency of the proposed approach with the findings
from four available algorithms for feature selection. The findings indicate
that the new approach continuously yields improved classification accuracy.
Related papers
- Large-scale Multi-objective Feature Selection: A Multi-phase Search Space Shrinking Approach [0.27624021966289597]
Feature selection is a crucial step in machine learning, especially for high-dimensional datasets.
This paper proposes a novel large-scale multi-objective evolutionary algorithm based on the search space shrinking, termed LMSSS.
The effectiveness of the proposed algorithm is demonstrated through comprehensive experiments on 15 large-scale datasets.
arXiv Detail & Related papers (2024-10-13T23:06:10Z) - Feature Selection as Deep Sequential Generative Learning [50.00973409680637]
We develop a deep variational transformer model over a joint of sequential reconstruction, variational, and performance evaluator losses.
Our model can distill feature selection knowledge and learn a continuous embedding space to map feature selection decision sequences into embedding vectors associated with utility scores.
arXiv Detail & Related papers (2024-03-06T16:31:56Z) - A Performance-Driven Benchmark for Feature Selection in Tabular Deep
Learning [131.2910403490434]
Data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones.
Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance.
We construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers.
We also propose an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems.
arXiv Detail & Related papers (2023-11-10T05:26:10Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - Sequential Attention for Feature Selection [12.89764845700709]
We propose a feature selection algorithm called Sequential Attention that achieves state-of-the-art empirical results for neural networks.
We give theoretical insights into our algorithm for linear regression by showing that an adaptation to this setting is equivalent to the classical Orthogonal Matching Pursuit (OMP) algorithm.
arXiv Detail & Related papers (2022-09-29T15:49:06Z) - Fair Feature Subset Selection using Multiobjective Genetic Algorithm [0.0]
We present a feature subset selection approach that improves both fairness and accuracy objectives.
We use statistical disparity as a fairness metric and F1-Score as a metric for model performance.
Our experiments on the most commonly used fairness benchmark datasets show that using the evolutionary algorithm we can effectively explore the trade-off between fairness and accuracy.
arXiv Detail & Related papers (2022-04-30T22:51:19Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Multivariate feature ranking of gene expression data [62.997667081978825]
We propose two new multivariate feature ranking methods based on pairwise correlation and pairwise consistency.
We statistically prove that the proposed methods outperform the state of the art feature ranking methods Clustering Variation, Chi Squared, Correlation, Information Gain, ReliefF and Significance.
arXiv Detail & Related papers (2021-11-03T17:19:53Z) - An Evolutionary Correlation-aware Feature Selection Method for
Classification Problems [3.2550305883611244]
In this paper, an estimation of distribution algorithm is proposed to meet three goals.
Firstly, as an extension of EDA, the proposed method generates only two individuals in each iteration that compete based on a fitness function.
Secondly, we provide a guiding technique for determining the number of features for individuals in each iteration.
As the main contribution of the paper, in addition to considering the importance of each feature alone, the proposed method can consider the interaction between features.
arXiv Detail & Related papers (2021-10-16T20:20:43Z) - A Systematic Characterization of Sampling Algorithms for Open-ended
Language Generation [71.31905141672529]
We study the widely adopted ancestral sampling algorithms for auto-regressive language models.
We identify three key properties that are shared among them: entropy reduction, order preservation, and slope preservation.
We find that the set of sampling algorithms that satisfies these properties performs on par with the existing sampling algorithms.
arXiv Detail & Related papers (2020-09-15T17:28:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.