Ensemble feature selection with data-driven thresholding for Alzheimer's
disease biomarker discovery
- URL: http://arxiv.org/abs/2207.01822v1
- Date: Tue, 5 Jul 2022 05:50:51 GMT
- Title: Ensemble feature selection with data-driven thresholding for Alzheimer's
disease biomarker discovery
- Authors: Annette Spooner, Gelareh Mohammadi, Perminder S. Sachdev, Henry
Brodaty, Arcot Sowmya (for the Sydney Memory and Ageing Study and the
Alzheimer's Disease Neuroimaging Initiative)
- Abstract summary: This work develops several data-driven thresholds to automatically identify the relevant features in an ensemble feature selector.
To demonstrate the applicability of these methods to clinical data, they are applied to data from two real-world Alzheimer's disease (AD) studies.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Healthcare datasets present many challenges to both machine learning and
statistics as their data are typically heterogeneous, censored,
high-dimensional and have missing information. Feature selection is often used
to identify the important features but can produce unstable results when
applied to high-dimensional data, selecting a different set of features on each
iteration.
The stability of feature selection can be improved with the use of feature
selection ensembles, which aggregate the results of multiple base feature
selectors. A threshold must be applied to the final aggregated feature set to
separate the relevant features from the redundant ones. A fixed threshold,
which is typically applied, offers no guarantee that the final set of selected
features contains only relevant features. This work develops several
data-driven thresholds to automatically identify the relevant features in an
ensemble feature selector and evaluates their predictive accuracy and
stability.
To demonstrate the applicability of these methods to clinical data, they are
applied to data from two real-world Alzheimer's disease (AD) studies. AD is a
progressive neurodegenerative disease with no known cure, that begins at least
2-3 decades before overt symptoms appear, presenting an opportunity for
researchers to identify early biomarkers that might identify patients at risk
of developing AD. Features identified by applying these methods to both
datasets reflect current findings in the AD literature.
Related papers
- Towards Within-Class Variation in Alzheimer's Disease Detection from Spontaneous Speech [60.08015780474457]
Alzheimer's Disease (AD) detection has emerged as a promising research area that employs machine learning classification models.
We identify within-class variation as a critical challenge in AD detection: individuals with AD exhibit a spectrum of cognitive impairments.
We propose two novel methods: Soft Target Distillation (SoTD) and Instance-level Re-balancing (InRe), targeting two problems respectively.
arXiv Detail & Related papers (2024-09-22T02:06:05Z) - Detecting and Identifying Selection Structure in Sequential Data [53.24493902162797]
We argue that the selective inclusion of data points based on latent objectives is common in practical situations, such as music sequences.
We show that selection structure is identifiable without any parametric assumptions or interventional experiments.
We also propose a provably correct algorithm to detect and identify selection structures as well as other types of dependencies.
arXiv Detail & Related papers (2024-06-29T20:56:34Z) - Utilizing Semantic Textual Similarity for Clinical Survey Data Feature
Selection [4.5574502769585745]
Machine learning models that attempt to predict outcomes from survey data can overfit and result in poor generalizability.
One remedy to this issue is feature selection, which attempts to select an optimal subset of features to learn upon.
The relationships between feature names and target names can be evaluated using language models (LMs) to produce semantic textual similarity (STS) scores.
We examine the performance using STS to select features directly and in the minimal-redundancy-maximal-relevance (mRMR) algorithm.
arXiv Detail & Related papers (2023-08-19T03:10:51Z) - Towards Understanding the Survival of Patients with High-Grade
Gastroenteropancreatic Neuroendocrine Neoplasms: An Investigation of Ensemble
Feature Selection in the Prediction of Overall Survival [0.0]
Ensemble feature selectors allow the user to identify such features in datasets with low sample sizes.
RENT and UBayFS are capable of integrating expert knowledge a priori in the feature selection process.
Our results demonstrate that both feature selectors allow accurate predictions, and that expert knowledge has a stabilizing effect on the feature set.
arXiv Detail & Related papers (2023-02-20T17:08:03Z) - Ensemble feature selection with clustering for analysis of
high-dimensional, correlated clinical data in the search for Alzheimer's
disease biomarkers [0.0]
We present a novel framework to create feature selection ensembles from multivariate feature selectors.
We take into account the biases produced by groups of correlated features, using agglomerative hierarchical clustering in a pre-processing step.
These methods were applied to two real-world datasets from studies of Alzheimer's disease (AD), a progressive neurodegenerative disease that has no cure and is not yet fully understood.
arXiv Detail & Related papers (2022-07-06T01:03:50Z) - Exploring linguistic feature and model combination for speech
recognition based automatic AD detection [61.91708957996086]
Speech based automatic AD screening systems provide a non-intrusive and more scalable alternative to other clinical screening techniques.
Scarcity of specialist data leads to uncertainty in both model selection and feature learning when developing such systems.
This paper investigates the use of feature and model combination approaches to improve the robustness of domain fine-tuning of BERT and Roberta pre-trained text encoders.
arXiv Detail & Related papers (2022-06-28T05:09:01Z) - Domain Invariant Model with Graph Convolutional Network for Mammogram
Classification [49.691629817104925]
We propose a novel framework, namely Domain Invariant Model with Graph Convolutional Network (DIM-GCN)
We first propose a Bayesian network, which explicitly decomposes the latent variables into disease-related and other disease-irrelevant parts that are provable to be disentangled from each other.
To better capture the macroscopic features, we leverage the observed clinical attributes as a goal for reconstruction, via Graph Convolutional Network (GCN)
arXiv Detail & Related papers (2022-04-21T08:23:44Z) - Multi-class versus One-class classifier in spontaneous speech analysis
oriented to Alzheimer Disease diagnosis [58.720142291102135]
The aim of our project is to contribute to earlier diagnosis of AD and better estimates of its severity by using automatic analysis performed through new biomarkers extracted from speech signal.
The use of information about outlier and Fractal Dimension features improves the system performance.
arXiv Detail & Related papers (2022-03-21T09:57:20Z) - Active Selection of Classification Features [0.0]
Auxiliary data, such as demographics, might help in selecting a smaller sample that comprises the individuals with the most informative MRI scans.
We propose two utility-based approaches for this problem, and evaluate their performance on three public real-world benchmark datasets.
arXiv Detail & Related papers (2021-02-26T18:19:08Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Analysis of ensemble feature selection for correlated high-dimensional
RNA-Seq cancer data [0.24366811507669126]
This study compares two approaches for the discovery of relevant variables.
The most informative features are identified using a four feature selection algorithms.
Unfortunately, models built on feature sets obtained from the ensemble of feature selection algorithms were no better than for models developed on feature sets obtained from individual algorithms.
arXiv Detail & Related papers (2020-04-28T20:38:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.