Deep Learning for Efficient GWAS Feature Selection
- URL: http://arxiv.org/abs/2312.15055v1
- Date: Fri, 22 Dec 2023 20:35:47 GMT
- Title: Deep Learning for Efficient GWAS Feature Selection
- Authors: Kexuan Li
- Abstract summary: This paper introduces an extension to the feature selection methodology proposed by Mirzaei et al.
Our extended approach enhances the original method by introducing a Frobenius norm penalty into the student network.
operating seamlessly in both supervised and unsupervised settings, our method employs two key neural networks.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Genome-Wide Association Studies (GWAS) face unique challenges in the era of
big genomics data, particularly when dealing with ultra-high-dimensional
datasets where the number of genetic features significantly exceeds the
available samples. This paper introduces an extension to the feature selection
methodology proposed by Mirzaei et al. (2020), specifically tailored to tackle
the intricacies associated with ultra-high-dimensional GWAS data. Our extended
approach enhances the original method by introducing a Frobenius norm penalty
into the student network, augmenting its capacity to adapt to scenarios
characterized by a multitude of features and limited samples. Operating
seamlessly in both supervised and unsupervised settings, our method employs two
key neural networks. The first leverages an autoencoder or supervised
autoencoder for dimension reduction, extracting salient features from the
ultra-high-dimensional genomic data. The second network, a regularized
feed-forward model with a single hidden layer, is designed for precise feature
selection. The introduction of the Frobenius norm penalty in the student
network significantly boosts the method's resilience to the challenges posed by
ultra-high-dimensional GWAS datasets. Experimental results showcase the
efficacy of our approach in feature selection for GWAS data. The method not
only handles the inherent complexities of ultra-high-dimensional settings but
also demonstrates superior adaptability to the nuanced structures present in
genomics data. The flexibility and versatility of our proposed methodology are
underscored by its successful performance across a spectrum of experiments.
Related papers
- Open-Set Deepfake Detection: A Parameter-Efficient Adaptation Method with Forgery Style Mixture [58.60915132222421]
We introduce an approach that is both general and parameter-efficient for face forgery detection.
We design a forgery-style mixture formulation that augments the diversity of forgery source domains.
We show that the designed model achieves state-of-the-art generalizability with significantly reduced trainable parameters.
arXiv Detail & Related papers (2024-08-23T01:53:36Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - A Performance-Driven Benchmark for Feature Selection in Tabular Deep
Learning [131.2910403490434]
Data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones.
Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance.
We construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers.
We also propose an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems.
arXiv Detail & Related papers (2023-11-10T05:26:10Z) - FedSDG-FS: Efficient and Secure Feature Selection for Vertical Federated
Learning [21.79965380400454]
Vertical Learning (VFL) enables multiple data owners, each holding a different subset of features about largely overlapping sets of data sample(s) to jointly train a useful global model.
Feature selection (FS) is important to VFL. It is still an open research problem as existing FS works designed for VFL either assumes prior knowledge on the number of noisy features or prior knowledge on the post-training threshold of useful features.
We propose the Federated Dual-Gate based Feature Selection (FedSDG-FS) approach. It consists of a Gaussian dual-gate to efficiently approximate the probability of a feature being selected, with privacy
arXiv Detail & Related papers (2023-02-21T03:09:45Z) - Graph Convolutional Network-based Feature Selection for High-dimensional
and Low-sample Size Data [4.266990593059533]
We present a deep learning-based method - GRAph Convolutional nEtwork feature Selector (GRACES) - to select important features for HDLSS data.
We demonstrate empirical evidence that GRACES outperforms other feature selection methods on both synthetic and real-world datasets.
arXiv Detail & Related papers (2022-11-25T14:46:36Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Deep Feature Screening: Feature Selection for Ultra High-Dimensional
Data via Deep Neural Networks [4.212520096619388]
We propose a novel two-step nonparametric approach called Deep Feature Screening (DeepFS)
DeepFS can identify significant features with high precision for ultra high-dimensional, low-sample-size data.
The superiority of DeepFS is demonstrated via extensive simulation studies and real data analyses.
arXiv Detail & Related papers (2022-04-04T17:51:49Z) - Adaptive Memory Networks with Self-supervised Learning for Unsupervised
Anomaly Detection [54.76993389109327]
Unsupervised anomaly detection aims to build models to detect unseen anomalies by only training on the normal data.
We propose a novel approach called Adaptive Memory Network with Self-supervised Learning (AMSL) to address these challenges.
AMSL incorporates a self-supervised learning module to learn general normal patterns and an adaptive memory fusion module to learn rich feature representations.
arXiv Detail & Related papers (2022-01-03T03:40:21Z) - Pure Exploration in Kernel and Neural Bandits [90.23165420559664]
We study pure exploration in bandits, where the dimension of the feature representation can be much larger than the number of arms.
To overcome the curse of dimensionality, we propose to adaptively embed the feature representation of each arm into a lower-dimensional space.
arXiv Detail & Related papers (2021-06-22T19:51:59Z) - Feature Selection Based on Sparse Neural Network Layer with Normalizing
Constraints [0.0]
We propose new neural-network based feature selection approach that introduces two constrains, the satisfying of which leads to sparse FS layer.
The results confirm that proposed Feature Selection Based on Sparse Neural Network Layer with Normalizing Constraints (SNEL-FS) is able to select the important features and yields superior performance compared to other conventional FS methods.
arXiv Detail & Related papers (2020-12-11T14:14:33Z) - Image-based Automated Species Identification: Can Virtual Data
Augmentation Overcome Problems of Insufficient Sampling? [0.0]
We present a two-level data augmentation approach to automated visual species identification.
The first level of data augmentation applies classic approaches of data augmentation and generation of faked images.
The second level of data augmentation employs synthetic additional sampling in feature space by an oversampling algorithm in vector space.
arXiv Detail & Related papers (2020-10-18T15:44:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.