A statistical framework for GWAS of high dimensional phenotypes using
summary statistics, with application to metabolite GWAS
- URL: http://arxiv.org/abs/2303.10221v1
- Date: Fri, 17 Mar 2023 19:33:25 GMT
- Title: A statistical framework for GWAS of high dimensional phenotypes using
summary statistics, with application to metabolite GWAS
- Authors: Weiqiong Huang, Emily C. Hector, Joshua Cape, Chris McKennan
- Abstract summary: We develop a novel model, theoretical framework, and set of methods to perform Bayesian inference in GWAS of high dimensional phenotypes.
We demonstrate the utility of our procedure by applying it to metabolite GWAS.
- Score: 1.6058099298620425
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The recent explosion of genetic and high dimensional biobank and 'omic' data
has provided researchers with the opportunity to investigate the shared genetic
origin (pleiotropy) of hundreds to thousands of related phenotypes. However,
existing methods for multi-phenotype genome-wide association studies (GWAS) do
not model pleiotropy, are only applicable to a small number of phenotypes, or
provide no way to perform inference. To add further complication, raw genetic
and phenotype data are rarely observed, meaning analyses must be performed on
GWAS summary statistics whose statistical properties in high dimensions are
poorly understood. We therefore developed a novel model, theoretical framework,
and set of methods to perform Bayesian inference in GWAS of high dimensional
phenotypes using summary statistics that explicitly model pleiotropy, beget
fast computation, and facilitate the use of biologically informed priors. We
demonstrate the utility of our procedure by applying it to metabolite GWAS,
where we develop new nonparametric priors for genetic effects on metabolite
levels that use known metabolic pathway information and foster interpretable
inference at the pathway level.
Related papers
- Gene-Metabolite Association Prediction with Interactive Knowledge Transfer Enhanced Graph for Metabolite Production [49.814615043389864]
We propose a new task, Gene-Metabolite Association Prediction based on metabolic graphs.
We present the first benchmark containing 2474 metabolites and 1947 genes of two commonly used microorganisms.
Our proposed methodology outperforms baselines by up to 12.3% across various link prediction frameworks.
arXiv Detail & Related papers (2024-10-24T06:54:27Z) - AI-driven multi-omics integration for multi-scale predictive modeling of causal genotype-environment-phenotype relationships [9.909750609459074]
We propose a new artificial intelligence (AI)-powered biology-inspired multi-scale modeling framework to tackle these issues.
This framework will integrate multi-omics data across biological levels, organism hierarchies, and species to predict causal genotype-environment-phenotype relationships under various conditions.
arXiv Detail & Related papers (2024-07-08T21:23:25Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Enhancing Phenotype Recognition in Clinical Notes Using Large Language
Models: PhenoBCBERT and PhenoGPT [11.20254354103518]
We developed two types of models: PhenoBCBERT, a BERT-based model, and PhenoGPT, a GPT-based model.
We found that our methods can extract more phenotype concepts, including novel ones not characterized by HPO.
arXiv Detail & Related papers (2023-08-11T03:40:22Z) - Unsupervised ensemble-based phenotyping helps enhance the
discoverability of genes related to heart morphology [57.25098075813054]
We propose a new framework for gene discovery entitled Un Phenotype Ensembles.
It builds a redundant yet highly expressive representation by pooling a set of phenotypes learned in an unsupervised manner.
These phenotypes are then analyzed via (GWAS), retaining only highly confident and stable associations.
arXiv Detail & Related papers (2023-01-07T18:36:44Z) - Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation.
GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z) - rfPhen2Gen: A machine learning based association study of brain imaging
phenotypes to genotypes [71.1144397510333]
We learned machine learning models to predict SNPs using 56 brain imaging QTs.
SNPs within the known Alzheimer disease (AD) risk gene APOE had lowest RMSE for lasso and random forest.
Random forests identified additional SNPs that were not prioritized by the linear models but are known to be associated with brain-related disorders.
arXiv Detail & Related papers (2022-03-31T20:15:22Z) - Statistical quantification of confounding bias in predictive modelling [0.0]
I propose the partial and full confounder tests, which probe the null hypotheses of unconfounded and fully confounded models.
The tests provide a strict control for Type I errors and high statistical power, even for non-normally and non-linearly dependent predictions.
arXiv Detail & Related papers (2021-11-01T10:35:24Z) - A Cross-Level Information Transmission Network for Predicting Phenotype
from New Genotype: Application to Cancer Precision Medicine [37.442717660492384]
We propose a novel Cross-LEvel Information Transmission network (CLEIT) framework.
Inspired by domain adaptation, CLEIT first learns the latent representation of high-level domain then uses it as ground-truth embedding.
We demonstrate the effectiveness and performance boost of CLEIT in predicting anti-cancer drug sensitivity from somatic mutations.
arXiv Detail & Related papers (2020-10-09T22:01:00Z) - Two-step penalised logistic regression for multi-omic data with an
application to cardiometabolic syndrome [62.997667081978825]
We implement a two-step approach to multi-omic logistic regression in which variable selection is performed on each layer separately.
Our approach should be preferred if the goal is to select as many relevant predictors as possible.
Our proposed approach allows us to identify features that characterise cardiometabolic syndrome at the molecular level.
arXiv Detail & Related papers (2020-08-01T10:36:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.