Ensemble feature selection with clustering for analysis of
high-dimensional, correlated clinical data in the search for Alzheimer's
disease biomarkers
- URL: http://arxiv.org/abs/2207.02380v1
- Date: Wed, 6 Jul 2022 01:03:50 GMT
- Title: Ensemble feature selection with clustering for analysis of
high-dimensional, correlated clinical data in the search for Alzheimer's
disease biomarkers
- Authors: Annette Spooner, Gelareh Mohammadi, Perminder S. Sachdev, Henry
Brodaty, Arcot Sowmya (for the Sydney Memory and Ageing Study and the
Alzheimer's Disease Neuroimaging Initiative)
- Abstract summary: We present a novel framework to create feature selection ensembles from multivariate feature selectors.
We take into account the biases produced by groups of correlated features, using agglomerative hierarchical clustering in a pre-processing step.
These methods were applied to two real-world datasets from studies of Alzheimer's disease (AD), a progressive neurodegenerative disease that has no cure and is not yet fully understood.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Healthcare datasets often contain groups of highly correlated features, such
as features from the same biological system. When feature selection is applied
to these datasets to identify the most important features, the biases inherent
in some multivariate feature selectors due to correlated features make it
difficult for these methods to distinguish between the important and irrelevant
features and the results of the feature selection process can be unstable.
Feature selection ensembles, which aggregate the results of multiple individual
base feature selectors, have been investigated as a means of stabilising
feature selection results, but do not address the problem of correlated
features. We present a novel framework to create feature selection ensembles
from multivariate feature selectors while taking into account the biases
produced by groups of correlated features, using agglomerative hierarchical
clustering in a pre-processing step. These methods were applied to two
real-world datasets from studies of Alzheimer's disease (AD), a progressive
neurodegenerative disease that has no cure and is not yet fully understood. Our
results show a marked improvement in the stability of features selected over
the models without clustering, and the features selected by these models are in
keeping with the findings in the AD literature.
Related papers
- Feature Selection as Deep Sequential Generative Learning [50.00973409680637]
We develop a deep variational transformer model over a joint of sequential reconstruction, variational, and performance evaluator losses.
Our model can distill feature selection knowledge and learn a continuous embedding space to map feature selection decision sequences into embedding vectors associated with utility scores.
arXiv Detail & Related papers (2024-03-06T16:31:56Z) - A Performance-Driven Benchmark for Feature Selection in Tabular Deep
Learning [131.2910403490434]
Data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones.
Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance.
We construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers.
We also propose an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems.
arXiv Detail & Related papers (2023-11-10T05:26:10Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - Composite Feature Selection using Deep Ensembles [130.72015919510605]
We investigate the problem of discovering groups of predictive features without predefined grouping.
We introduce a novel deep learning architecture that uses an ensemble of feature selection models to find predictive groups.
We propose a new metric to measure similarity between discovered groups and the ground truth.
arXiv Detail & Related papers (2022-11-01T17:49:40Z) - Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular
data [81.43750358586072]
We propose Data-IQ, a framework to systematically stratify examples into subgroups with respect to their outcomes.
We experimentally demonstrate the benefits of Data-IQ on four real-world medical datasets.
arXiv Detail & Related papers (2022-10-24T08:57:55Z) - Ensemble feature selection with data-driven thresholding for Alzheimer's
disease biomarker discovery [0.0]
This work develops several data-driven thresholds to automatically identify the relevant features in an ensemble feature selector.
To demonstrate the applicability of these methods to clinical data, they are applied to data from two real-world Alzheimer's disease (AD) studies.
arXiv Detail & Related papers (2022-07-05T05:50:51Z) - Multivariate feature ranking of gene expression data [62.997667081978825]
We propose two new multivariate feature ranking methods based on pairwise correlation and pairwise consistency.
We statistically prove that the proposed methods outperform the state of the art feature ranking methods Clustering Variation, Chi Squared, Correlation, Information Gain, ReliefF and Significance.
arXiv Detail & Related papers (2021-11-03T17:19:53Z) - Deep Unsupervised Feature Selection by Discarding Nuisance and
Correlated Features [7.288137686773523]
Modern datasets contain large subsets of correlated features and nuisance features.
In the presence of large numbers of nuisance features, the Laplacian must be computed on the subset of selected features.
We employ an autoencoder architecture to cope with correlated features, trained to reconstruct the data from the subset of selected features.
arXiv Detail & Related papers (2021-10-11T14:26:13Z) - ASMFS: Adaptive-Similarity-based Multi-modality Feature Selection for
Classification of Alzheimer's Disease [37.34130395221716]
We propose a novel multi-modality feature selection method, which performs feature selection and local similarity learning simultaniously.
The effectiveness of our proposed joint learning method can be well demonstrated by the experimental results on Alzheimer's Disease Neuroimaging Initiative dataset.
arXiv Detail & Related papers (2020-10-16T06:53:27Z) - Analysis of ensemble feature selection for correlated high-dimensional
RNA-Seq cancer data [0.24366811507669126]
This study compares two approaches for the discovery of relevant variables.
The most informative features are identified using a four feature selection algorithms.
Unfortunately, models built on feature sets obtained from the ensemble of feature selection algorithms were no better than for models developed on feature sets obtained from individual algorithms.
arXiv Detail & Related papers (2020-04-28T20:38:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.