Deep Unsupervised Feature Selection by Discarding Nuisance and
Correlated Features
- URL: http://arxiv.org/abs/2110.05306v1
- Date: Mon, 11 Oct 2021 14:26:13 GMT
- Title: Deep Unsupervised Feature Selection by Discarding Nuisance and
Correlated Features
- Authors: Uri Shaham, Ofir Lindenbaum, Jonathan Svirsky and Yuval Kluger
- Abstract summary: Modern datasets contain large subsets of correlated features and nuisance features.
In the presence of large numbers of nuisance features, the Laplacian must be computed on the subset of selected features.
We employ an autoencoder architecture to cope with correlated features, trained to reconstruct the data from the subset of selected features.
- Score: 7.288137686773523
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern datasets often contain large subsets of correlated features and
nuisance features, which are not or loosely related to the main underlying
structures of the data. Nuisance features can be identified using the Laplacian
score criterion, which evaluates the importance of a given feature via its
consistency with the Graph Laplacians' leading eigenvectors. We demonstrate
that in the presence of large numbers of nuisance features, the Laplacian must
be computed on the subset of selected features rather than on the complete
feature set. To do this, we propose a fully differentiable approach for
unsupervised feature selection, utilizing the Laplacian score criterion to
avoid the selection of nuisance features. We employ an autoencoder architecture
to cope with correlated features, trained to reconstruct the data from the
subset of selected features. Building on the recently proposed concrete layer
that allows controlling for the number of selected features via architectural
design, simplifying the optimization process. Experimenting on several
real-world datasets, we demonstrate that our proposed approach outperforms
similar approaches designed to avoid only correlated or nuisance features, but
not both. Several state-of-the-art clustering results are reported.
Related papers
- A Refreshed Similarity-based Upsampler for Direct High-Ratio Feature Upsampling [54.05517338122698]
We propose an explicitly controllable query-key feature alignment from both semantic-aware and detail-aware perspectives.
We also develop a fine-grained neighbor selection strategy on HR features, which is simple yet effective for alleviating mosaic artifacts.
Our proposed ReSFU framework consistently achieves satisfactory performance on different segmentation applications.
arXiv Detail & Related papers (2024-07-02T14:12:21Z) - Neuro-Symbolic Embedding for Short and Effective Feature Selection via Autoregressive Generation [22.87577374767465]
We reformulate feature selection through a neuro-symbolic lens and introduce a novel generative framework aimed at identifying short and effective feature subsets.
In this framework, we first create a data collector to automatically collect numerous feature selection samples consisting of feature ID tokens, model performance, and the measurement of feature subset redundancy.
Building on the collected data, an encoder-decoder-evaluator learning paradigm is developed to preserve the intelligence of feature selection into a continuous embedding space for efficient search.
arXiv Detail & Related papers (2024-04-26T05:01:08Z) - Feature Selection as Deep Sequential Generative Learning [50.00973409680637]
We develop a deep variational transformer model over a joint of sequential reconstruction, variational, and performance evaluator losses.
Our model can distill feature selection knowledge and learn a continuous embedding space to map feature selection decision sequences into embedding vectors associated with utility scores.
arXiv Detail & Related papers (2024-03-06T16:31:56Z) - A Performance-Driven Benchmark for Feature Selection in Tabular Deep
Learning [131.2910403490434]
Data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones.
Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance.
We construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers.
We also propose an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems.
arXiv Detail & Related papers (2023-11-10T05:26:10Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - Unsupervised Features Ranking via Coalitional Game Theory for
Categorical Data [0.28675177318965034]
Unsupervised feature selection aims to reduce the number of features.
We show that the deriving features' selection outperforms competing methods in lowering the redundancy rate.
arXiv Detail & Related papers (2022-05-17T14:17:36Z) - Learning Debiased and Disentangled Representations for Semantic
Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation.
By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes.
Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z) - Top-$k$ Regularization for Supervised Feature Selection [11.927046591097623]
We introduce a novel, simple yet effective regularization approach, named top-$k$ regularization, to supervised feature selection.
We show that the top-$k$ regularization is effective and stable for supervised feature selection.
arXiv Detail & Related papers (2021-06-04T01:12:47Z) - Adaptive Graph-based Generalized Regression Model for Unsupervised
Feature Selection [11.214334712819396]
How to select the uncorrelated and discriminative features is the key problem of unsupervised feature selection.
We present a novel generalized regression model imposed by an uncorrelated constraint and the $ell_2,1$-norm regularization.
It can simultaneously select the uncorrelated and discriminative features as well as reduce the variance of these data points belonging to the same neighborhood.
arXiv Detail & Related papers (2020-12-27T09:07:26Z) - Infinite Feature Selection: A Graph-based Feature Filtering Approach [78.63188057505012]
We propose a filtering feature selection framework that considers subsets of features as paths in a graph.
Going to infinite allows to constrain the computational complexity of the selection process.
We show that Inf-FS behaves better in almost any situation, that is, when the number of features to keep are fixed a priori.
arXiv Detail & Related papers (2020-06-15T07:20:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.