Sequential Attention for Feature Selection
- URL: http://arxiv.org/abs/2209.14881v3
- Date: Tue, 25 Apr 2023 15:48:02 GMT
- Title: Sequential Attention for Feature Selection
- Authors: Taisuke Yasuda, MohammadHossein Bateni, Lin Chen, Matthew Fahrbach,
Gang Fu, Vahab Mirrokni
- Abstract summary: We propose a feature selection algorithm called Sequential Attention that achieves state-of-the-art empirical results for neural networks.
We give theoretical insights into our algorithm for linear regression by showing that an adaptation to this setting is equivalent to the classical Orthogonal Matching Pursuit (OMP) algorithm.
- Score: 12.89764845700709
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Feature selection is the problem of selecting a subset of features for a
machine learning model that maximizes model quality subject to a budget
constraint. For neural networks, prior methods, including those based on
$\ell_1$ regularization, attention, and other techniques, typically select the
entire feature subset in one evaluation round, ignoring the residual value of
features during selection, i.e., the marginal contribution of a feature given
that other features have already been selected. We propose a feature selection
algorithm called Sequential Attention that achieves state-of-the-art empirical
results for neural networks. This algorithm is based on an efficient one-pass
implementation of greedy forward selection and uses attention weights at each
step as a proxy for feature importance. We give theoretical insights into our
algorithm for linear regression by showing that an adaptation to this setting
is equivalent to the classical Orthogonal Matching Pursuit (OMP) algorithm, and
thus inherits all of its provable guarantees. Our theoretical and empirical
analyses offer new explanations towards the effectiveness of attention and its
connections to overparameterization, which may be of independent interest.
Related papers
- Unveiling the Power of Sparse Neural Networks for Feature Selection [60.50319755984697]
Sparse Neural Networks (SNNs) have emerged as powerful tools for efficient feature selection.
We show that SNNs trained with dynamic sparse training (DST) algorithms can achieve, on average, more than $50%$ memory and $55%$ FLOPs reduction.
Our findings show that feature selection with SNNs trained with DST algorithms can achieve, on average, more than $50%$ memory and $55%$ FLOPs reduction.
arXiv Detail & Related papers (2024-08-08T16:48:33Z) - Feature Selection as Deep Sequential Generative Learning [50.00973409680637]
We develop a deep variational transformer model over a joint of sequential reconstruction, variational, and performance evaluator losses.
Our model can distill feature selection knowledge and learn a continuous embedding space to map feature selection decision sequences into embedding vectors associated with utility scores.
arXiv Detail & Related papers (2024-03-06T16:31:56Z) - Embedded feature selection in LSTM networks with multi-objective
evolutionary ensemble learning for time series forecasting [49.1574468325115]
We present a novel feature selection method embedded in Long Short-Term Memory networks.
Our approach optimize the weights and biases of the LSTM in a partitioned manner.
Experimental evaluations on air quality time series data from Italy and southeast Spain demonstrate that our method substantially improves the ability generalization of conventional LSTMs.
arXiv Detail & Related papers (2023-12-29T08:42:10Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - Neural Greedy Pursuit for Feature Selection [72.4121881681861]
We propose a greedy algorithm to select $N$ important features among $P$ input features for a non-linear prediction problem.
We use neural networks as predictors in the algorithm to compute the loss.
arXiv Detail & Related papers (2022-07-19T16:39:16Z) - Fair Feature Subset Selection using Multiobjective Genetic Algorithm [0.0]
We present a feature subset selection approach that improves both fairness and accuracy objectives.
We use statistical disparity as a fairness metric and F1-Score as a metric for model performance.
Our experiments on the most commonly used fairness benchmark datasets show that using the evolutionary algorithm we can effectively explore the trade-off between fairness and accuracy.
arXiv Detail & Related papers (2022-04-30T22:51:19Z) - Feature Selection Using Reinforcement Learning [0.0]
The space of variables or features that can be used to characterize a particular predictor of interest continues to grow exponentially.
Identifying the most characterizing features that minimizes the variance without jeopardizing the bias of our models is critical to successfully training a machine learning model.
arXiv Detail & Related papers (2021-01-23T09:24:37Z) - Algorithmic Stability and Generalization of an Unsupervised Feature
Selection Algorithm [20.564573628659918]
Algorithmic stability is a key characteristic of an algorithm regarding its sensitivity to perturbations of input samples.
In this paper, we propose an innovative unsupervised feature selection algorithm attaining this stability with provable guarantees.
arXiv Detail & Related papers (2020-10-19T12:25:39Z) - Joint Adaptive Graph and Structured Sparsity Regularization for
Unsupervised Feature Selection [6.41804410246642]
We propose a joint adaptive graph and structured sparsity regularization unsupervised feature selection (JASFS) method.
A subset of optimal features will be selected in group, and the number of selected features will be determined automatically.
Experimental results on eight benchmarks demonstrate the effectiveness and efficiency of the proposed method.
arXiv Detail & Related papers (2020-10-09T08:17:04Z) - A Novel Community Detection Based Genetic Algorithm for Feature
Selection [3.8848561367220276]
Authors propose a genetic algorithm based on community detection, which functions in three steps.
Nine benchmark classification problems were analyzed in terms of the performance of the presented approach.
arXiv Detail & Related papers (2020-08-08T15:39:30Z) - Stepwise Model Selection for Sequence Prediction via Deep Kernel
Learning [100.83444258562263]
We propose a novel Bayesian optimization (BO) algorithm to tackle the challenge of model selection in this setting.
In order to solve the resulting multiple black-box function optimization problem jointly and efficiently, we exploit potential correlations among black-box functions.
We are the first to formulate the problem of stepwise model selection (SMS) for sequence prediction, and to design and demonstrate an efficient joint-learning algorithm for this purpose.
arXiv Detail & Related papers (2020-01-12T09:42:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.