PhenoGnet: A Graph-Based Contrastive Learning Framework for Disease Similarity Prediction
- URL: http://arxiv.org/abs/2509.14037v1
- Date: Wed, 17 Sep 2025 14:38:52 GMT
- Title: PhenoGnet: A Graph-Based Contrastive Learning Framework for Disease Similarity Prediction
- Authors: Ranga Baminiwatte, Kazi Jewel Rana, Aaron J. Masino,
- Abstract summary: PhenoGnet is a graph-based contrastive learning framework designed to predict disease similarity.<n>Gene based embeddings achieved an AUCPR of 0.9012 and AUROC of 0.8764, outperforming existing state of the art methods.
- Score: 0.15293427903448018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding disease similarity is critical for advancing diagnostics, drug discovery, and personalized treatment strategies. We present PhenoGnet, a novel graph-based contrastive learning framework designed to predict disease similarity by integrating gene functional interaction networks with the Human Phenotype Ontology (HPO). PhenoGnet comprises two key components: an intra-view model that separately encodes gene and phenotype graphs using Graph Convolutional Networks (GCNs) and Graph Attention Networks (GATs), and a cross view model implemented as a shared weight multilayer perceptron (MLP) that aligns gene and phenotype embeddings through contrastive learning. The model is trained using known gene phenotype associations as positive pairs and randomly sampled unrelated pairs as negatives. Diseases are represented by the mean embeddings of their associated genes and/or phenotypes, and pairwise similarity is computed via cosine similarity. Evaluation on a curated benchmark of 1,100 similar and 866 dissimilar disease pairs demonstrates strong performance, with gene based embeddings achieving an AUCPR of 0.9012 and AUROC of 0.8764, outperforming existing state of the art methods. Notably, PhenoGnet captures latent biological relationships beyond direct overlap, offering a scalable and interpretable solution for disease similarity prediction. These results underscore its potential for enabling downstream applications in rare disease research and precision medicine.
Related papers
- GLaDiGAtor: Language-Model-Augmented Multi-Relation Graph Learning for Predicting Disease-Gene Associations [0.0]
Understanding disease-gene associations is essential to unravelling disease mechanisms and advancing diagnostics and therapeutics.<n>To address limitations in existing models, we propose GLaDiGAtor, a novel GNN framework with an encoder-decoder architecture for disease-gene association prediction.<n>GLaDiGAtor constructs a heterogeneous biological graph integrating gene-gene, disease-disease, and gene-disease interactions from curated databases.
arXiv Detail & Related papers (2026-02-21T09:26:44Z) - GRAPE: Heterogeneous Graph Representation Learning for Genetic Perturbation with Coding and Non-Coding Biotype [51.58774936662233]
Building gene regulatory networks (GRN) is essential to understand and predict the effects of genetic perturbations.<n>In this work, we leverage pre-trained large language model and DNA sequence model to extract features from gene descriptions and DNA sequence data.<n>We introduce gene biotype information for the first time in genetic perturbation, simulating the distinct roles of genes with different biotypes in regulating cellular processes.
arXiv Detail & Related papers (2025-05-06T03:35:24Z) - G2PDiffusion: Cross-Species Genotype-to-Phenotype Prediction via Evolutionary Diffusion [108.94237816552024]
We propose the first genotype-to-phenotype diffusion model (G2PDiffusion) that generates morphological images from DNA.<n>The model contains three novel components: 1) a MSA retrieval engine that identifies conserved and co-evolutionary patterns; 2) an environment-aware MSA conditional encoder that effectively models complex genotype-environment interactions; and 3) an adaptive phenomic alignment module to improve genotype-phenotype consistency.
arXiv Detail & Related papers (2025-02-07T06:16:31Z) - CSGDN: Contrastive Signed Graph Diffusion Network for Predicting Crop Gene-phenotype Associations [6.5678927417916455]
We propose a Contrastive Signed Graph Diffusion Network, CSGDN, to learn robust node representations with fewer training samples to achieve higher link prediction accuracy.
We conduct experiments to validate the performance of CSGDN on three crop datasets: Gossypium hirsutum, Brassica napus, and Triticum turgidum.
arXiv Detail & Related papers (2024-10-10T01:01:10Z) - Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - PhenoLinker: Phenotype-Gene Link Prediction and Explanation using
Heterogeneous Graph Neural Networks [38.216545389032234]
We present PhenoLinker, capable of associating a score to a phenotype-gene relationship by using heterogeneous information networks and a convolutional neural network-based model for graphs.
This system can aid in the discovery of new associations and in the understanding of the consequences of human genetic variation.
arXiv Detail & Related papers (2024-02-02T11:35:21Z) - Unsupervised ensemble-based phenotyping helps enhance the
discoverability of genes related to heart morphology [57.25098075813054]
We propose a new framework for gene discovery entitled Un Phenotype Ensembles.
It builds a redundant yet highly expressive representation by pooling a set of phenotypes learned in an unsupervised manner.
These phenotypes are then analyzed via (GWAS), retaining only highly confident and stable associations.
arXiv Detail & Related papers (2023-01-07T18:36:44Z) - rfPhen2Gen: A machine learning based association study of brain imaging
phenotypes to genotypes [71.1144397510333]
We learned machine learning models to predict SNPs using 56 brain imaging QTs.
SNPs within the known Alzheimer disease (AD) risk gene APOE had lowest RMSE for lasso and random forest.
Random forests identified additional SNPs that were not prioritized by the linear models but are known to be associated with brain-related disorders.
arXiv Detail & Related papers (2022-03-31T20:15:22Z) - Graph Based Link Prediction between Human Phenotypes and Genes [5.1398743023989555]
Recent advances in the field of machine learning is efficient to predict these interactions between abnormal human phenotypes and genes.
In this study, we developed a framework to predict links between human phenotype ontology (HPO) and genes.
Compared to the other 4 methods LightGBM is able to find more accurate interaction/link between human phenotype & gene pairs.
arXiv Detail & Related papers (2021-05-25T14:47:07Z) - Dynamic Graph Correlation Learning for Disease Diagnosis with Incomplete
Labels [66.57101219176275]
Disease diagnosis on chest X-ray images is a challenging multi-label classification task.
We propose a Disease Diagnosis Graph Convolutional Network (DD-GCN) that presents a novel view of investigating the inter-dependency among different diseases.
Our method is the first to build a graph over the feature maps with a dynamic adjacency matrix for correlation learning.
arXiv Detail & Related papers (2020-02-26T17:10:48Z) - A Sparse Graph-Structured Lasso Mixed Model for Genetic Association with
Confounding Correction [28.364820868064893]
We propose the sparse graph-structured linear mixed model (sGLMM) that can incorporate the relatedness information from traits in a dataset with confounding correction.
We show that the proposed model outperforms other existing approaches and can model correlation from both population structure and shared signals.
We also discuss the potential causal genetic variation of Human Alzheimer's disease discovered by our model and justify some of the most important genetic loci.
arXiv Detail & Related papers (2017-11-11T16:01:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.