Deep neural networks with controlled variable selection for the
identification of putative causal genetic variants
- URL: http://arxiv.org/abs/2109.14719v1
- Date: Wed, 29 Sep 2021 20:57:48 GMT
- Title: Deep neural networks with controlled variable selection for the
identification of putative causal genetic variants
- Authors: Peyman H. Kassani, Fred Lu, Yann Le Guen and Zihuai He
- Abstract summary: We propose an interpretable neural network model, stabilized using ensembling, with controlled variable selection for genetic studies.
The merit of the proposed method includes: (1) flexible modelling of the non-linear effect of genetic variants to improve statistical power; (2) multiple knockoffs in the input layer to rigorously control false discovery rate; (3) hierarchical layers to substantially reduce the number of weight parameters and activations to improve computational efficiency.
- Score: 0.43012765978447565
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deep neural networks (DNN) have been used successfully in many scientific
problems for their high prediction accuracy, but their application to genetic
studies remains challenging due to their poor interpretability. In this paper,
we consider the problem of scalable, robust variable selection in DNN for the
identification of putative causal genetic variants in genome sequencing
studies. We identified a pronounced randomness in feature selection in DNN due
to its stochastic nature, which may hinder interpretability and give rise to
misleading results. We propose an interpretable neural network model,
stabilized using ensembling, with controlled variable selection for genetic
studies. The merit of the proposed method includes: (1) flexible modelling of
the non-linear effect of genetic variants to improve statistical power; (2)
multiple knockoffs in the input layer to rigorously control false discovery
rate; (3) hierarchical layers to substantially reduce the number of weight
parameters and activations to improve computational efficiency; (4)
de-randomized feature selection to stabilize identified signals. We evaluated
the proposed method in extensive simulation studies and applied it to the
analysis of Alzheimer disease genetics. We showed that the proposed method,
when compared to conventional linear and nonlinear methods, can lead to
substantially more discoveries.
Related papers
- Targeted Cause Discovery with Data-Driven Learning [66.86881771339145]
We propose a novel machine learning approach for inferring causal variables of a target variable from observations.
We employ a neural network trained to identify causality through supervised learning on simulated data.
Empirical results demonstrate the effectiveness of our method in identifying causal relationships within large-scale gene regulatory networks.
arXiv Detail & Related papers (2024-08-29T02:21:11Z) - Interpreting artificial neural networks to detect genome-wide association signals for complex traits [0.0]
Investigating the genetic architecture of complex diseases is challenging due to the highly polygenic and interactive landscape of genetic and environmental factors.
We trained artificial neural networks for predicting complex traits using both simulated and real genotype/phenotype datasets.
arXiv Detail & Related papers (2024-07-26T15:20:42Z) - Identifying the Attractors of Gene Regulatory Networks from Expression Data under Uncertainty: An Interpretable Approach [0.0]
Given a temporal gene expression profile of a real gene regulatory network, how can the attractors be robustly identified?
This paper addresses this question using a novel approach based on Zadeh Computing with Words.
The proposed scheme could effectively identify the attractors from temporal gene expression data in terms of both fuzzy logic-based and linguistic descriptions.
arXiv Detail & Related papers (2024-03-16T20:56:22Z) - Predicting loss-of-function impact of genetic mutations: a machine
learning approach [0.0]
This paper aims to train machine learning models on the attributes of a genetic mutation to predict LoFtool scores.
These attributes included, but were not limited to, the position of a mutation on a chromosome, changes in amino acids, and changes in codons caused by the mutation.
Models were evaluated using five-fold cross-validated averages of r-squared, mean squared error, root mean squared error, mean absolute error, and explained variance.
arXiv Detail & Related papers (2024-01-26T19:27:38Z) - An Association Test Based on Kernel-Based Neural Networks for Complex
Genetic Association Analysis [0.8221435109014762]
We develop a kernel-based neural network model (KNN) that synergizes the strengths of linear mixed models with conventional neural networks.
MINQUE-based test to assess the joint association of genetic variants with the phenotype.
Two additional tests to evaluate and interpret linear and non-linear/non-additive genetic effects.
arXiv Detail & Related papers (2023-12-06T05:02:28Z) - DiscoGen: Learning to Discover Gene Regulatory Networks [30.83574314774383]
Accurately inferring Gene Regulatory Networks (GRNs) is a critical and challenging task in biology.
Recent advances in neural network-based causal discovery methods have significantly improved causal discovery.
Applying state-of-the-art causal discovery methods in biology poses challenges, such as noisy data and a large number of samples.
We introduce DiscoGen, a neural network-based GRN discovery method that can denoise gene expression measurements and handle interventional data.
arXiv Detail & Related papers (2023-04-12T13:02:49Z) - rfPhen2Gen: A machine learning based association study of brain imaging
phenotypes to genotypes [71.1144397510333]
We learned machine learning models to predict SNPs using 56 brain imaging QTs.
SNPs within the known Alzheimer disease (AD) risk gene APOE had lowest RMSE for lasso and random forest.
Random forests identified additional SNPs that were not prioritized by the linear models but are known to be associated with brain-related disorders.
arXiv Detail & Related papers (2022-03-31T20:15:22Z) - EINNs: Epidemiologically-Informed Neural Networks [75.34199997857341]
We introduce a new class of physics-informed neural networks-EINN-crafted for epidemic forecasting.
We investigate how to leverage both the theoretical flexibility provided by mechanistic models as well as the data-driven expressability afforded by AI models.
arXiv Detail & Related papers (2022-02-21T18:59:03Z) - The Causal Neural Connection: Expressiveness, Learnability, and
Inference [125.57815987218756]
An object called structural causal model (SCM) represents a collection of mechanisms and sources of random variation of the system under investigation.
In this paper, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020) still holds for neural models.
We introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences.
arXiv Detail & Related papers (2021-07-02T01:55:18Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Stochasticity in Neural ODEs: An Empirical Study [68.8204255655161]
Regularization of neural networks (e.g. dropout) is a widespread technique in deep learning that allows for better generalization.
We show that data augmentation during the training improves the performance of both deterministic and versions of the same model.
However, the improvements obtained by the data augmentation completely eliminate the empirical regularization gains, making the performance of neural ODE and neural SDE negligible.
arXiv Detail & Related papers (2020-02-22T22:12:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.