Permutation-based multi-objective evolutionary feature selection for high-dimensional data
- URL: http://arxiv.org/abs/2501.14310v1
- Date: Fri, 24 Jan 2025 08:11:28 GMT
- Title: Permutation-based multi-objective evolutionary feature selection for high-dimensional data
- Authors: Raquel Espinosa, Gracia Sánchez, José Palma, Fernando Jiménez,
- Abstract summary: We propose a novel feature selection method for high-dimensional data, based on the well-known permutation feature importance approach.
The proposed method employs a multi-objective evolutionary algorithm to search for candidate feature subsets.
The effectiveness of our method has been validated on a set of 24 publicly available high-dimensional datasets.
- Score: 43.18726655647964
- License:
- Abstract: Feature selection is a critical step in the analysis of high-dimensional data, where the number of features often vastly exceeds the number of samples. Effective feature selection not only improves model performance and interpretability but also reduces computational costs and mitigates the risk of overfitting. In this context, we propose a novel feature selection method for high-dimensional data, based on the well-known permutation feature importance approach, but extending it to evaluate subsets of attributes rather than individual features. This extension more effectively captures how interactions among features influence model performance. The proposed method employs a multi-objective evolutionary algorithm to search for candidate feature subsets, with the objectives of maximizing the degradation in model performance when the selected features are shuffled, and minimizing the cardinality of the feature subset. The effectiveness of our method has been validated on a set of 24 publicly available high-dimensional datasets for classification and regression tasks, and compared against 9 well-established feature selection methods designed for high-dimensional problems, including the conventional permutation feature importance method. The results demonstrate the ability of our approach in balancing accuracy and computational efficiency, providing a powerful tool for feature selection in complex, high-dimensional datasets.
Related papers
- Large-scale Multi-objective Feature Selection: A Multi-phase Search Space Shrinking Approach [0.27624021966289597]
Feature selection is a crucial step in machine learning, especially for high-dimensional datasets.
This paper proposes a novel large-scale multi-objective evolutionary algorithm based on the search space shrinking, termed LMSSS.
The effectiveness of the proposed algorithm is demonstrated through comprehensive experiments on 15 large-scale datasets.
arXiv Detail & Related papers (2024-10-13T23:06:10Z) - Neuro-Symbolic Embedding for Short and Effective Feature Selection via Autoregressive Generation [22.87577374767465]
We reformulate feature selection through a neuro-symbolic lens and introduce a novel generative framework aimed at identifying short and effective feature subsets.
In this framework, we first create a data collector to automatically collect numerous feature selection samples consisting of feature ID tokens, model performance, and the measurement of feature subset redundancy.
Building on the collected data, an encoder-decoder-evaluator learning paradigm is developed to preserve the intelligence of feature selection into a continuous embedding space for efficient search.
arXiv Detail & Related papers (2024-04-26T05:01:08Z) - Feature Selection as Deep Sequential Generative Learning [50.00973409680637]
We develop a deep variational transformer model over a joint of sequential reconstruction, variational, and performance evaluator losses.
Our model can distill feature selection knowledge and learn a continuous embedding space to map feature selection decision sequences into embedding vectors associated with utility scores.
arXiv Detail & Related papers (2024-03-06T16:31:56Z) - A Contrast Based Feature Selection Algorithm for High-dimensional Data
set in Machine Learning [9.596923373834093]
We propose a novel filter feature selection method, ContrastFS, which selects discriminative features based on the discrepancies features shown between different classes.
We validate effectiveness and efficiency of our approach on several widely studied benchmark datasets, results show that the new method performs favorably with negligible computation.
arXiv Detail & Related papers (2024-01-15T05:32:35Z) - Embedded feature selection in LSTM networks with multi-objective
evolutionary ensemble learning for time series forecasting [49.1574468325115]
We present a novel feature selection method embedded in Long Short-Term Memory networks.
Our approach optimize the weights and biases of the LSTM in a partitioned manner.
Experimental evaluations on air quality time series data from Italy and southeast Spain demonstrate that our method substantially improves the ability generalization of conventional LSTMs.
arXiv Detail & Related papers (2023-12-29T08:42:10Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Multivariate feature ranking of gene expression data [62.997667081978825]
We propose two new multivariate feature ranking methods based on pairwise correlation and pairwise consistency.
We statistically prove that the proposed methods outperform the state of the art feature ranking methods Clustering Variation, Chi Squared, Correlation, Information Gain, ReliefF and Significance.
arXiv Detail & Related papers (2021-11-03T17:19:53Z) - Top-$k$ Regularization for Supervised Feature Selection [11.927046591097623]
We introduce a novel, simple yet effective regularization approach, named top-$k$ regularization, to supervised feature selection.
We show that the top-$k$ regularization is effective and stable for supervised feature selection.
arXiv Detail & Related papers (2021-06-04T01:12:47Z) - Joint Adaptive Graph and Structured Sparsity Regularization for
Unsupervised Feature Selection [6.41804410246642]
We propose a joint adaptive graph and structured sparsity regularization unsupervised feature selection (JASFS) method.
A subset of optimal features will be selected in group, and the number of selected features will be determined automatically.
Experimental results on eight benchmarks demonstrate the effectiveness and efficiency of the proposed method.
arXiv Detail & Related papers (2020-10-09T08:17:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.