Related papers: Graph Based Link Prediction between Human Phenotypes and Genes

Graph Based Link Prediction between Human Phenotypes and Genes

URL: http://arxiv.org/abs/2105.11989v1
Date: Tue, 25 May 2021 14:47:07 GMT
Title: Graph Based Link Prediction between Human Phenotypes and Genes
Authors: Rushabh Patel, Yanhui Guo
Abstract summary: Recent advances in the field of machine learning is efficient to predict these interactions between abnormal human phenotypes and genes. In this study, we developed a framework to predict links between human phenotype ontology (HPO) and genes. Compared to the other 4 methods LightGBM is able to find more accurate interaction/link between human phenotype & gene pairs.
Score: 5.1398743023989555
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Background The learning of genotype-phenotype associations and history of human disease by doing detailed and precise analysis of phenotypic abnormalities can be defined as deep phenotyping. To understand and detect this interaction between phenotype and genotype is a fundamental step when translating precision medicine to clinical practice. The recent advances in the field of machine learning is efficient to predict these interactions between abnormal human phenotypes and genes. Methods In this study, we developed a framework to predict links between human phenotype ontology (HPO) and genes. The annotation data from the heterogeneous knowledge resources i.e., orphanet, is used to parse human phenotype-gene associations. To generate the embeddings for the nodes (HPO & genes), an algorithm called node2vec was used. It performs node sampling on this graph based on random walks, then learns features over these sampled nodes to generate embeddings. These embeddings were used to perform the downstream task to predict the presence of the link between these nodes using 5 different supervised machine learning algorithms. Results: The downstream link prediction task shows that the Gradient Boosting Decision Tree based model (LightGBM) achieved an optimal AUROC 0.904 and AUCPR 0.784. In addition, LightGBM achieved an optimal weighted F1 score of 0.87. Compared to the other 4 methods LightGBM is able to find more accurate interaction/link between human phenotype & gene pairs.

Related papers

GRAPE: Heterogeneous Graph Representation Learning for Genetic Perturbation with Coding and Non-Coding Biotype [51.58774936662233]
Building gene regulatory networks (GRN) is essential to understand and predict the effects of genetic perturbations.<n>In this work, we leverage pre-trained large language model and DNA sequence model to extract features from gene descriptions and DNA sequence data.<n>We introduce gene biotype information for the first time in genetic perturbation, simulating the distinct roles of genes with different biotypes in regulating cellular processes.
arXiv Detail & Related papers (2025-05-06T03:35:24Z)
Inferring genotype-phenotype maps using attention models [0.21990652930491852]
Predicting phenotype from genotype is a central challenge in genetics. Recent advances in machine learning, particularly attention-based models, offer a promising alternative. Here, we apply attention-based models to quantitative genetics.
arXiv Detail & Related papers (2025-04-14T16:32:17Z)
G2PDiffusion: Cross-Species Genotype-to-Phenotype Prediction via Evolutionary Diffusion [108.94237816552024]
We propose the first genotype-to-phenotype diffusion model (G2PDiffusion) that generates morphological images from DNA. The model contains three novel components: 1) a MSA retrieval engine that identifies conserved and co-evolutionary patterns; 2) an environment-aware MSA conditional encoder that effectively models complex genotype-environment interactions; and 3) an adaptive phenomic alignment module to improve genotype-phenotype consistency.
arXiv Detail & Related papers (2025-02-07T06:16:31Z)
A Hybrid Supervised and Self-Supervised Graph Neural Network for Edge-Centric Applications [0.0]
This paper presents a novel graph-based deep learning model for tasks involving relations between two nodes (edge-centric tasks) The model combines supervised and self-supervised learning, taking into account for the loss function the embeddings learned and patterns with and without ground truth. Experiments demonstrate that our model matches or exceeds existing methods for protein-protein interactions prediction and Gene Ontology (GO) terms prediction.
arXiv Detail & Related papers (2025-01-21T17:26:15Z)
GeSubNet: Gene Interaction Inference for Disease Subtype Network Generation [29.93863082158739]
Retrieving gene functional networks from knowledge databases presents a challenge due to the mismatch between disease networks and subtype-specific variations. We propose GeSubNet, which learns a unified representation capable of predicting gene interactions while distinguishing between different disease subtypes.
arXiv Detail & Related papers (2024-10-17T02:58:57Z)
CSGDN: Contrastive Signed Graph Diffusion Network for Predicting Crop Gene-phenotype Associations [6.5678927417916455]
We propose a Contrastive Signed Graph Diffusion Network, CSGDN, to learn robust node representations with fewer training samples to achieve higher link prediction accuracy. We conduct experiments to validate the performance of CSGDN on three crop datasets: Gossypium hirsutum, Brassica napus, and Triticum turgidum.
arXiv Detail & Related papers (2024-10-10T01:01:10Z)
Generation is better than Modification: Combating High Class Homophily Variance in Graph Anomaly Detection [51.11833609431406]
Homophily distribution differences between different classes are significantly greater than those in homophilic and heterophilic graphs. We introduce a new metric called Class Homophily Variance, which quantitatively describes this phenomenon. To mitigate its impact, we propose a novel GNN model named Homophily Edge Generation Graph Neural Network (HedGe)
arXiv Detail & Related papers (2024-03-15T14:26:53Z)
PhenoLinker: Phenotype-Gene Link Prediction and Explanation using Heterogeneous Graph Neural Networks [38.216545389032234]
We present PhenoLinker, capable of associating a score to a phenotype-gene relationship by using heterogeneous information networks and a convolutional neural network-based model for graphs. This system can aid in the discovery of new associations and in the understanding of the consequences of human genetic variation.
arXiv Detail & Related papers (2024-02-02T11:35:21Z)
Unsupervised ensemble-based phenotyping helps enhance the discoverability of genes related to heart morphology [57.25098075813054]
We propose a new framework for gene discovery entitled Un Phenotype Ensembles. It builds a redundant yet highly expressive representation by pooling a set of phenotypes learned in an unsupervised manner. These phenotypes are then analyzed via (GWAS), retaining only highly confident and stable associations.
arXiv Detail & Related papers (2023-01-07T18:36:44Z)
Heterogeneous Graph Neural Networks using Self-supervised Reciprocally Contrastive Learning [102.9138736545956]
Heterogeneous graph neural network (HGNN) is a very popular technique for the modeling and analysis of heterogeneous graphs. We develop for the first time a novel and robust heterogeneous graph contrastive learning approach, namely HGCL, which introduces two views on respective guidance of node attributes and graph topologies. In this new approach, we adopt distinct but most suitable attribute and topology fusion mechanisms in the two views, which are conducive to mining relevant information in attributes and topologies separately.
arXiv Detail & Related papers (2022-04-30T12:57:02Z)
Graph-in-Graph (GiG): Learning interpretable latent graphs in non-Euclidean domain for biological and healthcare applications [52.65389473899139]
Graphs are a powerful tool for representing and analyzing unstructured, non-Euclidean data ubiquitous in the healthcare domain. Recent works have shown that considering relationships between input data samples have a positive regularizing effect for the downstream task. We propose Graph-in-Graph (GiG), a neural network architecture for protein classification and brain imaging applications.
arXiv Detail & Related papers (2022-04-01T10:01:37Z)
rfPhen2Gen: A machine learning based association study of brain imaging phenotypes to genotypes [71.1144397510333]
We learned machine learning models to predict SNPs using 56 brain imaging QTs. SNPs within the known Alzheimer disease (AD) risk gene APOE had lowest RMSE for lasso and random forest. Random forests identified additional SNPs that were not prioritized by the linear models but are known to be associated with brain-related disorders.
arXiv Detail & Related papers (2022-03-31T20:15:22Z)
Handling highly correlated genes in prediction analysis of genomic studies [0.0]
High correlation among genes introduces technical problems, such as multi-collinearity issues, leading to unreliable prediction models. We propose a grouping algorithm, which treats highly correlated genes as a group and uses their common pattern to represent the group's biological signal in feature selection. Our proposed grouping method has two advantages. First, using the gene group's common patterns makes the prediction more robust and reliable under condition change.
arXiv Detail & Related papers (2020-07-05T22:14:03Z)
Heterogeneous Graph Neural Networks for Malicious Account Detection [64.0046412312209]
We present GEM, the first heterogeneous graph neural network approach for detecting malicious accounts. We learn discriminative embeddings from heterogeneous account-device graphs based on two fundamental weaknesses of attackers, i.e. device aggregation and activity aggregation. Experiments show that our approaches consistently perform promising results compared with competitive methods over time.
arXiv Detail & Related papers (2020-02-27T18:26:44Z)
A Sparse Graph-Structured Lasso Mixed Model for Genetic Association with Confounding Correction [28.364820868064893]
We propose the sparse graph-structured linear mixed model (sGLMM) that can incorporate the relatedness information from traits in a dataset with confounding correction. We show that the proposed model outperforms other existing approaches and can model correlation from both population structure and shared signals. We also discuss the potential causal genetic variation of Human Alzheimer's disease discovered by our model and justify some of the most important genetic loci.
arXiv Detail & Related papers (2017-11-11T16:01:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.