Beyond Discrete Selection: Continuous Embedding Space Optimization for
Generative Feature Selection
- URL: http://arxiv.org/abs/2302.13221v4
- Date: Fri, 15 Sep 2023 02:23:40 GMT
- Title: Beyond Discrete Selection: Continuous Embedding Space Optimization for
Generative Feature Selection
- Authors: Meng Xiao and Dongjie Wang and Min Wu and Pengfei Wang and Yuanchun
Zhou and Yanjie Fu
- Abstract summary: We reformulate the feature selection problem as a deep differentiable optimization task.
We propose a new principled research perspective: conceptualizing discrete feature subsetting as continuous embedding space.
Specifically, we utilize reinforcement feature selection learning to generate diverse and high-quality training data.
- Score: 34.32619834917906
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of Feature Selection - comprising filter, wrapper, and embedded
approaches - is to find the optimal feature subset for designated downstream
tasks. Nevertheless, current feature selection methods are limited by: 1) the
selection criteria of these methods are varied for different domains, making
them hard to generalize; 2) the selection performance of these approaches drops
significantly when processing high-dimensional feature space coupled with small
sample size. In light of these challenges, we pose the question: can selected
feature subsets be more robust, accurate, and input dimensionality agnostic? In
this paper, we reformulate the feature selection problem as a deep
differentiable optimization task and propose a new research perspective:
conceptualizing discrete feature subsetting as continuous embedding space
optimization. We introduce a novel and principled framework that encompasses a
sequential encoder, an accuracy evaluator, a sequential decoder, and a gradient
ascent optimizer. This comprehensive framework includes four important steps:
preparation of features-accuracy training data, deep feature subset embedding,
gradient-optimized search, and feature subset reconstruction. Specifically, we
utilize reinforcement feature selection learning to generate diverse and
high-quality training data and enhance generalization. By optimizing
reconstruction and accuracy losses, we embed feature selection knowledge into a
continuous space using an encoder-evaluator-decoder model structure. We employ
a gradient ascent search algorithm to find better embeddings in the learned
embedding space. Furthermore, we reconstruct feature selection solutions using
these embeddings and select the feature subset with the highest performance for
downstream tasks as the optimal subset.
Related papers
- Large-scale Multi-objective Feature Selection: A Multi-phase Search Space Shrinking Approach [0.27624021966289597]
Feature selection is a crucial step in machine learning, especially for high-dimensional datasets.
This paper proposes a novel large-scale multi-objective evolutionary algorithm based on the search space shrinking, termed LMSSS.
The effectiveness of the proposed algorithm is demonstrated through comprehensive experiments on 15 large-scale datasets.
arXiv Detail & Related papers (2024-10-13T23:06:10Z) - An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences.
We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration.
Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z) - Neuro-Symbolic Embedding for Short and Effective Feature Selection via Autoregressive Generation [22.87577374767465]
We reformulate feature selection through a neuro-symbolic lens and introduce a novel generative framework aimed at identifying short and effective feature subsets.
In this framework, we first create a data collector to automatically collect numerous feature selection samples consisting of feature ID tokens, model performance, and the measurement of feature subset redundancy.
Building on the collected data, an encoder-decoder-evaluator learning paradigm is developed to preserve the intelligence of feature selection into a continuous embedding space for efficient search.
arXiv Detail & Related papers (2024-04-26T05:01:08Z) - Feature Selection as Deep Sequential Generative Learning [50.00973409680637]
We develop a deep variational transformer model over a joint of sequential reconstruction, variational, and performance evaluator losses.
Our model can distill feature selection knowledge and learn a continuous embedding space to map feature selection decision sequences into embedding vectors associated with utility scores.
arXiv Detail & Related papers (2024-03-06T16:31:56Z) - Multi-objective Binary Coordinate Search for Feature Selection [0.24578723416255746]
We propose the binary multi-objective coordinate search (MOCS) algorithm to solve large-scale feature selection problems.
Results indicate the significant superiority of our method over NSGA-II, on five real-world large-scale datasets.
arXiv Detail & Related papers (2024-02-20T00:50:26Z) - A Performance-Driven Benchmark for Feature Selection in Tabular Deep
Learning [131.2910403490434]
Data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones.
Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance.
We construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers.
We also propose an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems.
arXiv Detail & Related papers (2023-11-10T05:26:10Z) - Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest.
Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree.
We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z) - Tree ensemble kernels for Bayesian optimization with known constraints
over mixed-feature spaces [54.58348769621782]
Tree ensembles can be well-suited for black-box optimization tasks such as algorithm tuning and neural architecture search.
Two well-known challenges in using tree ensembles for black-box optimization are (i) effectively quantifying model uncertainty for exploration and (ii) optimizing over the piece-wise constant acquisition function.
Our framework performs as well as state-of-the-art methods for unconstrained black-box optimization over continuous/discrete features and outperforms competing methods for problems combining mixed-variable feature spaces and known input constraints.
arXiv Detail & Related papers (2022-07-02T16:59:37Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Feature Selection Using Reinforcement Learning [0.0]
The space of variables or features that can be used to characterize a particular predictor of interest continues to grow exponentially.
Identifying the most characterizing features that minimizes the variance without jeopardizing the bias of our models is critical to successfully training a machine learning model.
arXiv Detail & Related papers (2021-01-23T09:24:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.