Feature Selection integrated Deep Learning for Ultrahigh Dimensional and
Highly Correlated Feature Space
- URL: http://arxiv.org/abs/2209.07011v2
- Date: Sun, 18 Sep 2022 17:10:33 GMT
- Title: Feature Selection integrated Deep Learning for Ultrahigh Dimensional and
Highly Correlated Feature Space
- Authors: Arkaprabha Ganguli, Tapabrata Maiti
- Abstract summary: We propose a novel screening and cleaning strategy with the aid of deep learning for the cluster-level discovery of highly correlated predictors with a controlled error rate.
A thorough empirical evaluation over a wide range of simulated scenarios demonstrates the effectiveness of the proposed method by achieving high power while having a minimal number of false discoveries.
- Score: 0.456877715768796
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, deep learning has been a topic of interest in almost all
disciplines due to its impressive empirical success in analyzing complex data
sets, such as imaging, genetics, climate, and medical data. While most of the
developments are treated as black-box machines, there is an increasing interest
in interpretable, reliable, and robust deep learning models applicable to a
broad class of applications. Feature-selected deep learning is proven to be
promising in this regard. However, the recent developments do not address the
situations of ultra-high dimensional and highly correlated feature selection in
addition to the high noise level. In this article, we propose a novel screening
and cleaning strategy with the aid of deep learning for the cluster-level
discovery of highly correlated predictors with a controlled error rate. A
thorough empirical evaluation over a wide range of simulated scenarios
demonstrates the effectiveness of the proposed method by achieving high power
while having a minimal number of false discoveries. Furthermore, we implemented
the algorithm in the riboflavin (vitamin $B_2$) production dataset in the
context of understanding the possible genetic association with riboflavin
production. The gain of the proposed methodology is illustrated by achieving
lower prediction error compared to other state-of-the-art methods.
Related papers
- Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A
Benchmarking Study [0.6291443816903801]
This paper evaluates a diverse array of machine learning-based anomaly detection algorithms.
The paper contributes significantly by conducting an unbiased comparison of various anomaly detection algorithms.
arXiv Detail & Related papers (2024-02-11T19:12:51Z) - Snapshot Spectral Clustering -- a costless approach to deep clustering
ensembles generation [0.0]
This paper proposes a novel deep clustering ensemble method - Snapshot Spectral Clustering.
It is designed to maximize the gain from combining multiple data views while minimizing the computational costs of creating the ensemble.
arXiv Detail & Related papers (2023-07-17T16:01:22Z) - Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data.
Main aim of the identified model is to predict new data from previous observations.
We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z) - CustOmics: A versatile deep-learning based strategy for multi-omics
integration [0.0]
This paper presents a novel strategy to build a customizable autoencoder model that adapts to the dataset used in the case of high-dimensional multi-source integration.
We will assess the impact of integration strategies on the latent representation and combine the best strategies to propose a new method, CustOmics.
arXiv Detail & Related papers (2022-09-12T14:20:29Z) - Continual Learning with Bayesian Model based on a Fixed Pre-trained
Feature Extractor [55.9023096444383]
Current deep learning models are characterised by catastrophic forgetting of old knowledge when learning new classes.
Inspired by the process of learning new knowledge in human brains, we propose a Bayesian generative model for continual learning.
arXiv Detail & Related papers (2022-04-28T08:41:51Z) - Learning Neural Causal Models with Active Interventions [83.44636110899742]
We introduce an active intervention-targeting mechanism which enables a quick identification of the underlying causal structure of the data-generating process.
Our method significantly reduces the required number of interactions compared with random intervention targeting.
We demonstrate superior performance on multiple benchmarks from simulated to real-world data.
arXiv Detail & Related papers (2021-09-06T13:10:37Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Siloed Federated Learning for Multi-Centric Histopathology Datasets [0.17842332554022694]
This paper proposes a novel federated learning approach for deep learning architectures in the medical domain.
Local-statistic batch normalization (BN) layers are introduced, resulting in collaboratively-trained, yet center-specific models.
We benchmark the proposed method on the classification of tumorous histopathology image patches extracted from the Camelyon16 and Camelyon17 datasets.
arXiv Detail & Related papers (2020-08-17T15:49:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.