Knockoff-Guided Feature Selection via A Single Pre-trained Reinforced
Agent
- URL: http://arxiv.org/abs/2403.04015v1
- Date: Wed, 6 Mar 2024 19:58:19 GMT
- Title: Knockoff-Guided Feature Selection via A Single Pre-trained Reinforced
Agent
- Authors: Xinyuan Wang, Dongjie Wang, Wangyang Ying, Rui Xie, Haifeng Chen,
Yanjie Fu
- Abstract summary: We introduce an innovative framework for feature selection guided by knockoff features and optimized through reinforcement learning.
A deep Q-network, pre-trained with the original features and their corresponding pseudo labels, is employed to improve the efficacy of the exploration process.
A new epsilon-greedy strategy is used, incorporating insights from the pseudo labels to make the feature selection process more effective.
- Score: 44.84307718534031
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feature selection prepares the AI-readiness of data by eliminating redundant
features. Prior research falls into two primary categories: i) Supervised
Feature Selection, which identifies the optimal feature subset based on their
relevance to the target variable; ii) Unsupervised Feature Selection, which
reduces the feature space dimensionality by capturing the essential information
within the feature set instead of using target variable. However, SFS
approaches suffer from time-consuming processes and limited generalizability
due to the dependence on the target variable and downstream ML tasks. UFS
methods are constrained by the deducted feature space is latent and
untraceable. To address these challenges, we introduce an innovative framework
for feature selection, which is guided by knockoff features and optimized
through reinforcement learning, to identify the optimal and effective feature
subset. In detail, our method involves generating "knockoff" features that
replicate the distribution and characteristics of the original features but are
independent of the target variable. Each feature is then assigned a pseudo
label based on its correlation with all the knockoff features, serving as a
novel metric for feature evaluation. Our approach utilizes these pseudo labels
to guide the feature selection process in 3 novel ways, optimized by a single
reinforced agent: 1). A deep Q-network, pre-trained with the original features
and their corresponding pseudo labels, is employed to improve the efficacy of
the exploration process in feature selection. 2). We introduce unsupervised
rewards to evaluate the feature subset quality based on the pseudo labels and
the feature space reconstruction loss to reduce dependencies on the target
variable. 3). A new {\epsilon}-greedy strategy is used, incorporating insights
from the pseudo labels to make the feature selection process more effective.
Related papers
- A Refreshed Similarity-based Upsampler for Direct High-Ratio Feature Upsampling [54.05517338122698]
We propose an explicitly controllable query-key feature alignment from both semantic-aware and detail-aware perspectives.
We also develop a fine-grained neighbor selection strategy on HR features, which is simple yet effective for alleviating mosaic artifacts.
Our proposed ReSFU framework consistently achieves satisfactory performance on different segmentation applications.
arXiv Detail & Related papers (2024-07-02T14:12:21Z) - Neuro-Symbolic Embedding for Short and Effective Feature Selection via Autoregressive Generation [22.87577374767465]
We reformulate feature selection through a neuro-symbolic lens and introduce a novel generative framework aimed at identifying short and effective feature subsets.
In this framework, we first create a data collector to automatically collect numerous feature selection samples consisting of feature ID tokens, model performance, and the measurement of feature subset redundancy.
Building on the collected data, an encoder-decoder-evaluator learning paradigm is developed to preserve the intelligence of feature selection into a continuous embedding space for efficient search.
arXiv Detail & Related papers (2024-04-26T05:01:08Z) - Feature Selection as Deep Sequential Generative Learning [50.00973409680637]
We develop a deep variational transformer model over a joint of sequential reconstruction, variational, and performance evaluator losses.
Our model can distill feature selection knowledge and learn a continuous embedding space to map feature selection decision sequences into embedding vectors associated with utility scores.
arXiv Detail & Related papers (2024-03-06T16:31:56Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - Parallel feature selection based on the trace ratio criterion [4.30274561163157]
This work presents a novel parallel feature selection approach for classification, namely Parallel Feature Selection using Trace criterion (PFST)
Our method uses trace criterion, a measure of class separability used in Fisher's Discriminant Analysis, to evaluate feature usefulness.
The experiments show that our method can produce a small set of features in a fraction of the amount of time by the other methods under comparison.
arXiv Detail & Related papers (2022-03-03T10:50:33Z) - Adaptive Graph-based Generalized Regression Model for Unsupervised
Feature Selection [11.214334712819396]
How to select the uncorrelated and discriminative features is the key problem of unsupervised feature selection.
We present a novel generalized regression model imposed by an uncorrelated constraint and the $ell_2,1$-norm regularization.
It can simultaneously select the uncorrelated and discriminative features as well as reduce the variance of these data points belonging to the same neighborhood.
arXiv Detail & Related papers (2020-12-27T09:07:26Z) - Dual-Refinement: Joint Label and Feature Refinement for Unsupervised
Domain Adaptive Person Re-Identification [51.98150752331922]
Unsupervised domain adaptive (UDA) person re-identification (re-ID) is a challenging task due to the missing of labels for the target domain data.
We propose a novel approach, called Dual-Refinement, that jointly refines pseudo labels at the off-line clustering phase and features at the on-line training phase.
Our method outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2020-12-26T07:35:35Z) - Simplifying Reinforced Feature Selection via Restructured Choice
Strategy of Single Agent [32.483981722074574]
We develop a single-agent reinforced feature selection approach integrated with restructured choice strategy.
We exploit only one single agent to handle the selection task of multiple features, instead of using multiple agents.
We propose a convolutional auto-encoder algorithm, integrated with the encoded index information of features, to improve state representation.
arXiv Detail & Related papers (2020-09-19T13:41:39Z) - Infinite Feature Selection: A Graph-based Feature Filtering Approach [78.63188057505012]
We propose a filtering feature selection framework that considers subsets of features as paths in a graph.
Going to infinite allows to constrain the computational complexity of the selection process.
We show that Inf-FS behaves better in almost any situation, that is, when the number of features to keep are fixed a priori.
arXiv Detail & Related papers (2020-06-15T07:20:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.