Composite Feature Selection using Deep Ensembles
- URL: http://arxiv.org/abs/2211.00631v1
- Date: Tue, 1 Nov 2022 17:49:40 GMT
- Title: Composite Feature Selection using Deep Ensembles
- Authors: Fergus Imrie, Alexander Norcliffe, Pietro Lio, Mihaela van der Schaar
- Abstract summary: We investigate the problem of discovering groups of predictive features without predefined grouping.
We introduce a novel deep learning architecture that uses an ensemble of feature selection models to find predictive groups.
We propose a new metric to measure similarity between discovered groups and the ground truth.
- Score: 130.72015919510605
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In many real world problems, features do not act alone but in combination
with each other. For example, in genomics, diseases might not be caused by any
single mutation but require the presence of multiple mutations. Prior work on
feature selection either seeks to identify individual features or can only
determine relevant groups from a predefined set. We investigate the problem of
discovering groups of predictive features without predefined grouping. To do
so, we define predictive groups in terms of linear and non-linear interactions
between features. We introduce a novel deep learning architecture that uses an
ensemble of feature selection models to find predictive groups, without
requiring candidate groups to be provided. The selected groups are sparse and
exhibit minimum overlap. Furthermore, we propose a new metric to measure
similarity between discovered groups and the ground truth. We demonstrate the
utility of our model on multiple synthetic tasks and semi-synthetic chemistry
datasets, where the ground truth structure is known, as well as an image
dataset and a real-world cancer dataset.
Related papers
- Detecting and Identifying Selection Structure in Sequential Data [53.24493902162797]
We argue that the selective inclusion of data points based on latent objectives is common in practical situations, such as music sequences.
We show that selection structure is identifiable without any parametric assumptions or interventional experiments.
We also propose a provably correct algorithm to detect and identify selection structures as well as other types of dependencies.
arXiv Detail & Related papers (2024-06-29T20:56:34Z) - Automated Model Selection for Tabular Data [0.1797555376258229]
R's mixed effect linear models library allows users to provide interactive feature combinations in the model design.
We aim to automate the model selection process for predictions on datasets incorporating feature interactions.
The framework includes two distinct approaches for feature selection: a Priority-based Random Grid Search and a Greedy Search method.
arXiv Detail & Related papers (2024-01-01T21:41:20Z) - Sparse and geometry-aware generalisation of the mutual information for joint discriminative clustering and feature selection [19.066989850964756]
We introduce a discriminative clustering model trying to maximise a geometry-aware generalisation of the mutual information called GEMINI.
This algorithm avoids the burden of feature exploration and is easily scalable to high-dimensional data and large amounts of samples while only designing a discriminative clustering model.
Our results show that Sparse GEMINI is a competitive algorithm and has the ability to select relevant subsets of variables with respect to the clustering without using relevance criteria or prior hypotheses.
arXiv Detail & Related papers (2023-02-07T10:52:04Z) - Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular
data [81.43750358586072]
We propose Data-IQ, a framework to systematically stratify examples into subgroups with respect to their outcomes.
We experimentally demonstrate the benefits of Data-IQ on four real-world medical datasets.
arXiv Detail & Related papers (2022-10-24T08:57:55Z) - Ensemble feature selection with clustering for analysis of
high-dimensional, correlated clinical data in the search for Alzheimer's
disease biomarkers [0.0]
We present a novel framework to create feature selection ensembles from multivariate feature selectors.
We take into account the biases produced by groups of correlated features, using agglomerative hierarchical clustering in a pre-processing step.
These methods were applied to two real-world datasets from studies of Alzheimer's disease (AD), a progressive neurodegenerative disease that has no cure and is not yet fully understood.
arXiv Detail & Related papers (2022-07-06T01:03:50Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Flexible variable selection in the presence of missing data [0.0]
We propose a non-parametric variable selection algorithm combined with multiple imputation to develop flexible panels in the presence of missing-at-random data.
We show that our proposal has good operating characteristics and results in panels with higher classification and variable selection performance.
arXiv Detail & Related papers (2022-02-25T21:41:03Z) - Towards Group Robustness in the presence of Partial Group Labels [61.33713547766866]
spurious correlations between input samples and the target labels wrongly direct the neural network predictions.
We propose an algorithm that optimize for the worst-off group assignments from a constraint set.
We show improvements in the minority group's performance while preserving overall aggregate accuracy across groups.
arXiv Detail & Related papers (2022-01-10T22:04:48Z) - Commutative Lie Group VAE for Disentanglement Learning [96.32813624341833]
We view disentanglement learning as discovering an underlying structure that equivariantly reflects the factorized variations shown in data.
A simple model named Commutative Lie Group VAE is introduced to realize the group-based disentanglement learning.
Experiments show that our model can effectively learn disentangled representations without supervision, and can achieve state-of-the-art performance without extra constraints.
arXiv Detail & Related papers (2021-06-07T07:03:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.