Beyond Discrete Selection: Continuous Embedding Space Optimization for
Generative Feature Selection
- URL: http://arxiv.org/abs/2302.13221v4
- Date: Fri, 15 Sep 2023 02:23:40 GMT
- Title: Beyond Discrete Selection: Continuous Embedding Space Optimization for
Generative Feature Selection
- Authors: Meng Xiao and Dongjie Wang and Min Wu and Pengfei Wang and Yuanchun
Zhou and Yanjie Fu
- Abstract summary: We reformulate the feature selection problem as a deep differentiable optimization task.
We propose a new principled research perspective: conceptualizing discrete feature subsetting as continuous embedding space.
Specifically, we utilize reinforcement feature selection learning to generate diverse and high-quality training data.
- Score: 34.32619834917906
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of Feature Selection - comprising filter, wrapper, and embedded
approaches - is to find the optimal feature subset for designated downstream
tasks. Nevertheless, current feature selection methods are limited by: 1) the
selection criteria of these methods are varied for different domains, making
them hard to generalize; 2) the selection performance of these approaches drops
significantly when processing high-dimensional feature space coupled with small
sample size. In light of these challenges, we pose the question: can selected
feature subsets be more robust, accurate, and input dimensionality agnostic? In
this paper, we reformulate the feature selection problem as a deep
differentiable optimization task and propose a new research perspective:
conceptualizing discrete feature subsetting as continuous embedding space
optimization. We introduce a novel and principled framework that encompasses a
sequential encoder, an accuracy evaluator, a sequential decoder, and a gradient
ascent optimizer. This comprehensive framework includes four important steps:
preparation of features-accuracy training data, deep feature subset embedding,
gradient-optimized search, and feature subset reconstruction. Specifically, we
utilize reinforcement feature selection learning to generate diverse and
high-quality training data and enhance generalization. By optimizing
reconstruction and accuracy losses, we embed feature selection knowledge into a
continuous space using an encoder-evaluator-decoder model structure. We employ
a gradient ascent search algorithm to find better embeddings in the learned
embedding space. Furthermore, we reconstruct feature selection solutions using
these embeddings and select the feature subset with the highest performance for
downstream tasks as the optimal subset.
Related papers
- Neuro-Symbolic Embedding for Short and Effective Feature Selection via Autoregressive Generation [22.87577374767465]
We reformulate feature selection through a neuro-symbolic lens and introduce a novel generative framework aimed at identifying short and effective feature subsets.
In this framework, we first create a data collector to automatically collect numerous feature selection samples consisting of feature ID tokens, model performance, and the measurement of feature subset redundancy.
Building on the collected data, an encoder-decoder-evaluator learning paradigm is developed to preserve the intelligence of feature selection into a continuous embedding space for efficient search.
arXiv Detail & Related papers (2024-04-26T05:01:08Z) - Feature Selection as Deep Sequential Generative Learning [50.00973409680637]
We develop a deep variational transformer model over a joint of sequential reconstruction, variational, and performance evaluator losses.
Our model can distill feature selection knowledge and learn a continuous embedding space to map feature selection decision sequences into embedding vectors associated with utility scores.
arXiv Detail & Related papers (2024-03-06T16:31:56Z) - Multi-objective Binary Coordinate Search for Feature Selection [0.24578723416255746]
We propose the binary multi-objective coordinate search (MOCS) algorithm to solve large-scale feature selection problems.
Results indicate the significant superiority of our method over NSGA-II, on five real-world large-scale datasets.
arXiv Detail & Related papers (2024-02-20T00:50:26Z) - A Performance-Driven Benchmark for Feature Selection in Tabular Deep
Learning [131.2910403490434]
Data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones.
Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance.
We construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers.
We also propose an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems.
arXiv Detail & Related papers (2023-11-10T05:26:10Z) - Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest.
Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree.
We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z) - Tree ensemble kernels for Bayesian optimization with known constraints
over mixed-feature spaces [54.58348769621782]
Tree ensembles can be well-suited for black-box optimization tasks such as algorithm tuning and neural architecture search.
Two well-known challenges in using tree ensembles for black-box optimization are (i) effectively quantifying model uncertainty for exploration and (ii) optimizing over the piece-wise constant acquisition function.
Our framework performs as well as state-of-the-art methods for unconstrained black-box optimization over continuous/discrete features and outperforms competing methods for problems combining mixed-variable feature spaces and known input constraints.
arXiv Detail & Related papers (2022-07-02T16:59:37Z) - i-Razor: A Differentiable Neural Input Razor for Feature Selection and
Dimension Search in DNN-Based Recommender Systems [8.992480061695138]
Noisy features and inappropriate embedding dimension assignments can deteriorate the performance of recommender systems.
We propose a differentiable neural input razor (i-Razor) that enables joint optimization of feature selection and dimension search.
arXiv Detail & Related papers (2022-04-01T08:30:06Z) - Fast Feature Selection with Fairness Constraints [49.142308856826396]
We study the fundamental problem of selecting optimal features for model construction.
This problem is computationally challenging on large datasets, even with the use of greedy algorithm variants.
We extend the adaptive query model, recently proposed for the greedy forward selection for submodular functions, to the faster paradigm of Orthogonal Matching Pursuit for non-submodular functions.
The proposed algorithm achieves exponentially fast parallel run time in the adaptive query model, scaling much better than prior work.
arXiv Detail & Related papers (2022-02-28T12:26:47Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Feature Selection Using Reinforcement Learning [0.0]
The space of variables or features that can be used to characterize a particular predictor of interest continues to grow exponentially.
Identifying the most characterizing features that minimizes the variance without jeopardizing the bias of our models is critical to successfully training a machine learning model.
arXiv Detail & Related papers (2021-01-23T09:24:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.