Related papers: G2PDiffusion: Cross-Species Genotype-to-Phenotype Prediction via Evolutionary Diffusion

G2PDiffusion: Cross-Species Genotype-to-Phenotype Prediction via Evolutionary Diffusion

URL: http://arxiv.org/abs/2502.04684v3
Date: Mon, 10 Mar 2025 03:08:27 GMT
Title: G2PDiffusion: Cross-Species Genotype-to-Phenotype Prediction via Evolutionary Diffusion
Authors: Mengdi Liu, Zhangyang Gao, Hong Chang, Stan Z. Li, Shiguang Shan, Xilin Chen,
Abstract summary: We propose the first genotype-to-phenotype diffusion model (G2PDiffusion) that generates morphological images from DNA.<n>The model contains three novel components: 1) a MSA retrieval engine that identifies conserved and co-evolutionary patterns; 2) an environment-aware MSA conditional encoder that effectively models complex genotype-environment interactions; and 3) an adaptive phenomic alignment module to improve genotype-phenotype consistency.
Score: 108.94237816552024
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Understanding how genes influence phenotype across species is a fundamental challenge in genetic engineering, which will facilitate advances in various fields such as crop breeding, conservation biology, and personalized medicine. However, current phenotype prediction models are limited to individual species and expensive phenotype labeling process, making the genotype-to-phenotype prediction a highly domain-dependent and data-scarce problem. To this end, we suggest taking images as morphological proxies, facilitating cross-species generalization through large-scale multimodal pretraining. We propose the first genotype-to-phenotype diffusion model (G2PDiffusion) that generates morphological images from DNA considering two critical evolutionary signals, i.e., multiple sequence alignments (MSA) and environmental contexts. The model contains three novel components: 1) a MSA retrieval engine that identifies conserved and co-evolutionary patterns; 2) an environment-aware MSA conditional encoder that effectively models complex genotype-environment interactions; and 3) an adaptive phenomic alignment module to improve genotype-phenotype consistency. Extensive experiments show that integrating evolutionary signals with environmental context enriches the model's understanding of phenotype variability across species, thereby offering a valuable and promising exploration into advanced AI-assisted genomic analysis.

Related papers

Flow Matching Meets Biology and Life Science: A Survey [65.2146737141455]
Flow matching has emerged as a powerful and efficient alternative to diffusion-based generative modeling.<n>This paper presents the first comprehensive survey of recent developments in flow matching and its applications in biological domains.
arXiv Detail & Related papers (2025-07-23T17:44:29Z)
GRAPE: Heterogeneous Graph Representation Learning for Genetic Perturbation with Coding and Non-Coding Biotype [51.58774936662233]
Building gene regulatory networks (GRN) is essential to understand and predict the effects of genetic perturbations.<n>In this work, we leverage pre-trained large language model and DNA sequence model to extract features from gene descriptions and DNA sequence data.<n>We introduce gene biotype information for the first time in genetic perturbation, simulating the distinct roles of genes with different biotypes in regulating cellular processes.
arXiv Detail & Related papers (2025-05-06T03:35:24Z)
Inferring genotype-phenotype maps using attention models [0.21990652930491852]
Predicting phenotype from genotype is a central challenge in genetics. Recent advances in machine learning, particularly attention-based models, offer a promising alternative. Here, we apply attention-based models to quantitative genetics.
arXiv Detail & Related papers (2025-04-14T16:32:17Z)
GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters. Trained on an expansive dataset comprising 386B bp of DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks. It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of enhancer sequences with specific activity profiles.
arXiv Detail & Related papers (2025-02-11T05:39:49Z)
CSGDN: Contrastive Signed Graph Diffusion Network for Predicting Crop Gene-phenotype Associations [6.5678927417916455]
We propose a Contrastive Signed Graph Diffusion Network, CSGDN, to learn robust node representations with fewer training samples to achieve higher link prediction accuracy. We conduct experiments to validate the performance of CSGDN on three crop datasets: Gossypium hirsutum, Brassica napus, and Triticum turgidum.
arXiv Detail & Related papers (2024-10-10T01:01:10Z)
Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification [119.13058298388101]
We develop a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances. BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules. BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules.
arXiv Detail & Related papers (2024-06-05T06:42:27Z)
Learning to Predict Mutation Effects of Protein-Protein Interactions by Microenvironment-aware Hierarchical Prompt Learning [78.38442423223832]
We develop a novel codebook pre-training task, namely masked microenvironment modeling. We demonstrate superior performance and training efficiency over state-of-the-art pre-training-based methods in mutation effect prediction.
arXiv Detail & Related papers (2024-05-16T03:53:21Z)
Seeing Unseen: Discover Novel Biomedical Concepts via Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues. We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space. A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z)
On The Nature Of The Phenotype In Tree Genetic Programming [3.8642945120580703]
We discuss the basic concepts of genotypes and phenotypes in tree-based GP (TGP) We then analyze their behavior using five benchmark datasets. To generate phenotypes, we provide a unique technique for removing semantically ineffective code from GP trees.
arXiv Detail & Related papers (2024-02-12T19:19:29Z)
Unsupervised ensemble-based phenotyping helps enhance the discoverability of genes related to heart morphology [57.25098075813054]
We propose a new framework for gene discovery entitled Un Phenotype Ensembles. It builds a redundant yet highly expressive representation by pooling a set of phenotypes learned in an unsupervised manner. These phenotypes are then analyzed via (GWAS), retaining only highly confident and stable associations.
arXiv Detail & Related papers (2023-01-07T18:36:44Z)
Probabilistic Genotype-Phenotype Maps Reveal Mutational Robustness of RNA Folding, Spin Glasses, and Quantum Circuits [0.0]
We introduce probabilistic genotype-phenotype maps, where each genotype maps to a vector of phenotype probabilities.<n>We study three model systems to show that PrGP maps offer a generalized framework which can handle uncertainty emerging from various physical sources.<n>We derive an analytical theory for the behavior of PrGP robustness, and we demonstrate that the theory is highly predictive of empirical robustness.
arXiv Detail & Related papers (2023-01-04T23:09:38Z)
Phenotype Search Trajectory Networks for Linear Genetic Programming [8.079719491562305]
Neutrality is the observation that some mutations do not lead to phenotypic changes. We study the search trajectories of a genetic programming system as graph-based models. We measure the characteristics of phenotypes including their genotypic abundance and Kolmogorov complexity.
arXiv Detail & Related papers (2022-11-15T21:20:50Z)
Graph Coloring via Neural Networks for Haplotype Assembly and Viral Quasispecies Reconstruction [8.828330486848753]
We develop a new method, called NeurHap, that combines graph representation learning with optimization. Our experiments demonstrate substantially better performance of NeurHap in real and synthetic datasets compared to competing approaches.
arXiv Detail & Related papers (2022-10-21T12:53:09Z)
rfPhen2Gen: A machine learning based association study of brain imaging phenotypes to genotypes [71.1144397510333]
We learned machine learning models to predict SNPs using 56 brain imaging QTs. SNPs within the known Alzheimer disease (AD) risk gene APOE had lowest RMSE for lasso and random forest. Random forests identified additional SNPs that were not prioritized by the linear models but are known to be associated with brain-related disorders.
arXiv Detail & Related papers (2022-03-31T20:15:22Z)
A Cross-Level Information Transmission Network for Predicting Phenotype from New Genotype: Application to Cancer Precision Medicine [37.442717660492384]
We propose a novel Cross-LEvel Information Transmission network (CLEIT) framework. Inspired by domain adaptation, CLEIT first learns the latent representation of high-level domain then uses it as ground-truth embedding. We demonstrate the effectiveness and performance boost of CLEIT in predicting anti-cancer drug sensitivity from somatic mutations.
arXiv Detail & Related papers (2020-10-09T22:01:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.