Inferring genotype-phenotype maps using attention models
- URL: http://arxiv.org/abs/2504.10388v1
- Date: Mon, 14 Apr 2025 16:32:17 GMT
- Title: Inferring genotype-phenotype maps using attention models
- Authors: Krishna Rijal, Caroline M. Holmes, Samantha Petti, Gautam Reddy, Michael M. Desai, Pankaj Mehta,
- Abstract summary: Predicting phenotype from genotype is a central challenge in genetics.<n>Recent advances in machine learning, particularly attention-based models, offer a promising alternative.<n>Here, we apply attention-based models to quantitative genetics.
- Score: 0.21990652930491852
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Predicting phenotype from genotype is a central challenge in genetics. Traditional approaches in quantitative genetics typically analyze this problem using methods based on linear regression. These methods generally assume that the genetic architecture of complex traits can be parameterized in terms of an additive model, where the effects of loci are independent, plus (in some cases) pairwise epistatic interactions between loci. However, these models struggle to analyze more complex patterns of epistasis or subtle gene-environment interactions. Recent advances in machine learning, particularly attention-based models, offer a promising alternative. Initially developed for natural language processing, attention-based models excel at capturing context-dependent interactions and have shown exceptional performance in predicting protein structure and function. Here, we apply attention-based models to quantitative genetics. We analyze the performance of this attention-based approach in predicting phenotype from genotype using simulated data across a range of models with increasing epistatic complexity, and using experimental data from a recent quantitative trait locus mapping study in budding yeast. We find that our model demonstrates superior out-of-sample predictions in epistatic regimes compared to standard methods. We also explore a more general multi-environment attention-based model to jointly analyze genotype-phenotype maps across multiple environments and show that such architectures can be used for "transfer learning" - predicting phenotypes in novel environments with limited training data.
Related papers
- Simulation-based inference for stochastic nonlinear mixed-effects models with applications in systems biology [0.29998889086656577]
We propose a novel methodology for scalable Bayesian inference in hierarchical mixed-effects models.
Our framework first constructs amortized approximations of the likelihood and the posterior distribution, which are then rapidly refined for each individual dataset.
Our approach proved to be both fast and competitive in terms of statistical accuracy.
arXiv Detail & Related papers (2025-04-15T15:18:58Z) - G2PDiffusion: Cross-Species Genotype-to-Phenotype Prediction via Evolutionary Diffusion [108.94237816552024]
We propose the first genotype-to-phenotype diffusion model (G2PDiffusion) that generates morphological images from DNA.<n>The model contains three novel components: 1) a MSA retrieval engine that identifies conserved and co-evolutionary patterns; 2) an environment-aware MSA conditional encoder that effectively models complex genotype-environment interactions; and 3) an adaptive phenomic alignment module to improve genotype-phenotype consistency.
arXiv Detail & Related papers (2025-02-07T06:16:31Z) - Stacked ensemble\-based mutagenicity prediction model using multiple modalities with graph attention network [0.9736758288065405]
Mutagenicity is a concern due to its association with genetic mutations which can result in a variety of negative consequences.
In this work, we introduce a novel stacked ensemble based mutagenicity prediction model.
arXiv Detail & Related papers (2024-09-03T09:14:21Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Generalising sequence models for epigenome predictions with tissue and
assay embeddings [1.9999259391104391]
We show that strong correlation can be achieved across a large range of experimental conditions by integrating tissue and assay embeddings into a Contextualised Genomic Network (CGN)
We exhibit the efficacy of our approach across a broad set of epigenetic profiles and provide the first insights into the effect of genetic variants on epigenetic sequence model training.
arXiv Detail & Related papers (2023-08-22T10:34:19Z) - Unsupervised ensemble-based phenotyping helps enhance the
discoverability of genes related to heart morphology [57.25098075813054]
We propose a new framework for gene discovery entitled Un Phenotype Ensembles.
It builds a redundant yet highly expressive representation by pooling a set of phenotypes learned in an unsupervised manner.
These phenotypes are then analyzed via (GWAS), retaining only highly confident and stable associations.
arXiv Detail & Related papers (2023-01-07T18:36:44Z) - Hypothesis Testing using Causal and Causal Variational Generative Models [0.0]
Causal Gen and Causal Variational Gen can utilize nonparametric structural causal knowledge combined with a deep learning functional approximation.
We show how, using a deliberate (non-random) split of training and testing data, these models can generalize better to similar, but out-of-distribution data points.
We validate our methods on a synthetic pendulum dataset, as well as a trauma surgery ground level fall dataset.
arXiv Detail & Related papers (2022-10-20T13:46:15Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - rfPhen2Gen: A machine learning based association study of brain imaging
phenotypes to genotypes [71.1144397510333]
We learned machine learning models to predict SNPs using 56 brain imaging QTs.
SNPs within the known Alzheimer disease (AD) risk gene APOE had lowest RMSE for lasso and random forest.
Random forests identified additional SNPs that were not prioritized by the linear models but are known to be associated with brain-related disorders.
arXiv Detail & Related papers (2022-03-31T20:15:22Z) - Mycorrhiza: Genotype Assignment usingPhylogenetic Networks [2.286041284499166]
We introduce Mycorrhiza, a machine learning approach for the genotype assignment problem.
Our algorithm makes use of phylogenetic networks to engineer features that encode the evolutionary relationships among samples.
Mycorrhiza yields particularly significant gains on datasets with a large average fixation index (FST) or deviation from the Hardy-Weinberg equilibrium.
arXiv Detail & Related papers (2020-10-14T02:36:27Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.