Related papers: HeFS: Helper-Enhanced Feature Selection via Pareto-Optimized Genetic Search

HeFS: Helper-Enhanced Feature Selection via Pareto-Optimized Genetic Search

URL: http://arxiv.org/abs/2510.18575v1
Date: Tue, 21 Oct 2025 12:30:22 GMT
Title: HeFS: Helper-Enhanced Feature Selection via Pareto-Optimized Genetic Search
Authors: Yusi Fan, Tian Wang, Zhiying Yan, Chang Liu, Qiong Zhou, Qi Lu, Zhehao Guo, Ziqi Deng, Wenyu Zhu, Ruochi Zhang, Fengfeng Zhou,
Abstract summary: We introduce the HeFS (Helper-Enhanced Feature Selection) framework to refine feature subsets produced by existing algorithms.<n>HeFS systematically searches the residual feature space to identify a Helper Set - features that complement the original subset and improve classification performance.<n> Experiments on 18 benchmark datasets demonstrate that HeFS consistently identifies yet informative features and achieves superior performance over state-of-the-art methods.
Score: 10.751560953850925
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Feature selection is a combinatorial optimization problem that is NP-hard. Conventional approaches often employ heuristic or greedy strategies, which are prone to premature convergence and may fail to capture subtle yet informative features. This limitation becomes especially critical in high-dimensional datasets, where complex and interdependent feature relationships prevail. We introduce the HeFS (Helper-Enhanced Feature Selection) framework to refine feature subsets produced by existing algorithms. HeFS systematically searches the residual feature space to identify a Helper Set - features that complement the original subset and improve classification performance. The approach employs a biased initialization scheme and a ratio-guided mutation mechanism within a genetic algorithm, coupled with Pareto-based multi-objective optimization to jointly maximize predictive accuracy and feature complementarity. Experiments on 18 benchmark datasets demonstrate that HeFS consistently identifies overlooked yet informative features and achieves superior performance over state-of-the-art methods, including in challenging domains such as gastric cancer classification, drug toxicity prediction, and computer science applications. The code and datasets are available at https://healthinformaticslab.org/supp/.

Related papers

Representation Quantization for Collaborative Filtering Augmentation [49.14087936092634]
We propose a novel two-stage collaborative recommendation algorithm, DQRec.<n>It augments features and homogeneous linkages by extracting behavior characteristics jointly from interaction sequences and attributes.<n>By integrating these semantic ID patterns into the recommendation process through feature and linkage augmentation, the system enriches both latent and explicit user and item features.
arXiv Detail & Related papers (2025-08-15T04:00:50Z)
Large-scale Multi-objective Feature Selection: A Multi-phase Search Space Shrinking Approach [0.27624021966289597]
Feature selection is a crucial step in machine learning, especially for high-dimensional datasets. This paper proposes a novel large-scale multi-objective evolutionary algorithm based on the search space shrinking, termed LMSSS. The effectiveness of the proposed algorithm is demonstrated through comprehensive experiments on 15 large-scale datasets.
arXiv Detail & Related papers (2024-10-13T23:06:10Z)
Feature Selection as Deep Sequential Generative Learning [50.00973409680637]
We develop a deep variational transformer model over a joint of sequential reconstruction, variational, and performance evaluator losses. Our model can distill feature selection knowledge and learn a continuous embedding space to map feature selection decision sequences into embedding vectors associated with utility scores.
arXiv Detail & Related papers (2024-03-06T16:31:56Z)
Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data. We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures. We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z)
Tree ensemble kernels for Bayesian optimization with known constraints over mixed-feature spaces [54.58348769621782]
Tree ensembles can be well-suited for black-box optimization tasks such as algorithm tuning and neural architecture search. Two well-known challenges in using tree ensembles for black-box optimization are (i) effectively quantifying model uncertainty for exploration and (ii) optimizing over the piece-wise constant acquisition function. Our framework performs as well as state-of-the-art methods for unconstrained black-box optimization over continuous/discrete features and outperforms competing methods for problems combining mixed-variable feature spaces and known input constraints.
arXiv Detail & Related papers (2022-07-02T16:59:37Z)
Fast Feature Selection with Fairness Constraints [49.142308856826396]
We study the fundamental problem of selecting optimal features for model construction. This problem is computationally challenging on large datasets, even with the use of greedy algorithm variants. We extend the adaptive query model, recently proposed for the greedy forward selection for submodular functions, to the faster paradigm of Orthogonal Matching Pursuit for non-submodular functions. The proposed algorithm achieves exponentially fast parallel run time in the adaptive query model, scaling much better than prior work.
arXiv Detail & Related papers (2022-02-28T12:26:47Z)
An Evolutionary Correlation-aware Feature Selection Method for Classification Problems [3.2550305883611244]
In this paper, an estimation of distribution algorithm is proposed to meet three goals. Firstly, as an extension of EDA, the proposed method generates only two individuals in each iteration that compete based on a fitness function. Secondly, we provide a guiding technique for determining the number of features for individuals in each iteration. As the main contribution of the paper, in addition to considering the importance of each feature alone, the proposed method can consider the interaction between features.
arXiv Detail & Related papers (2021-10-16T20:20:43Z)
Top-$k$ Regularization for Supervised Feature Selection [11.927046591097623]
We introduce a novel, simple yet effective regularization approach, named top-$k$ regularization, to supervised feature selection. We show that the top-$k$ regularization is effective and stable for supervised feature selection.
arXiv Detail & Related papers (2021-06-04T01:12:47Z)
Adversarial Feature Augmentation and Normalization for Visual Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models. Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings. We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z)
A Novel Community Detection Based Genetic Algorithm for Feature Selection [3.8848561367220276]
Authors propose a genetic algorithm based on community detection, which functions in three steps. Nine benchmark classification problems were analyzed in terms of the performance of the presented approach.
arXiv Detail & Related papers (2020-08-08T15:39:30Z)
Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management. We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
IVFS: Simple and Efficient Feature Selection for High Dimensional Topology Preservation [33.424663018395684]
We propose a simple and effective feature selection algorithm to enhance sample similarity preservation. The proposed algorithm is able to well preserve the pairwise distances, as well as topological patterns, of the full data.
arXiv Detail & Related papers (2020-04-02T23:05:00Z)
New advances in enumerative biclustering algorithms with online partitioning [80.22629846165306]
This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets. The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime.
arXiv Detail & Related papers (2020-03-07T14:54:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.