Graph Based Link Prediction between Human Phenotypes and Genes
- URL: http://arxiv.org/abs/2105.11989v1
- Date: Tue, 25 May 2021 14:47:07 GMT
- Title: Graph Based Link Prediction between Human Phenotypes and Genes
- Authors: Rushabh Patel, Yanhui Guo
- Abstract summary: Recent advances in the field of machine learning is efficient to predict these interactions between abnormal human phenotypes and genes.
In this study, we developed a framework to predict links between human phenotype ontology (HPO) and genes.
Compared to the other 4 methods LightGBM is able to find more accurate interaction/link between human phenotype & gene pairs.
- Score: 5.1398743023989555
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Background The learning of genotype-phenotype associations and history of
human disease by doing detailed and precise analysis of phenotypic
abnormalities can be defined as deep phenotyping. To understand and detect this
interaction between phenotype and genotype is a fundamental step when
translating precision medicine to clinical practice. The recent advances in the
field of machine learning is efficient to predict these interactions between
abnormal human phenotypes and genes.
Methods In this study, we developed a framework to predict links between
human phenotype ontology (HPO) and genes. The annotation data from the
heterogeneous knowledge resources i.e., orphanet, is used to parse human
phenotype-gene associations. To generate the embeddings for the nodes (HPO &
genes), an algorithm called node2vec was used. It performs node sampling on
this graph based on random walks, then learns features over these sampled nodes
to generate embeddings. These embeddings were used to perform the downstream
task to predict the presence of the link between these nodes using 5 different
supervised machine learning algorithms.
Results: The downstream link prediction task shows that the Gradient Boosting
Decision Tree based model (LightGBM) achieved an optimal AUROC 0.904 and AUCPR
0.784. In addition, LightGBM achieved an optimal weighted F1 score of 0.87.
Compared to the other 4 methods LightGBM is able to find more accurate
interaction/link between human phenotype & gene pairs.
Related papers
- GeSubNet: Gene Interaction Inference for Disease Subtype Network Generation [29.93863082158739]
Retrieving gene functional networks from knowledge databases presents a challenge due to the mismatch between disease networks and subtype-specific variations.
We propose GeSubNet, which learns a unified representation capable of predicting gene interactions while distinguishing between different disease subtypes.
arXiv Detail & Related papers (2024-10-17T02:58:57Z) - CSGDN: Contrastive Signed Graph Diffusion Network for Predicting Crop Gene-phenotype Associations [6.5678927417916455]
We propose a Contrastive Signed Graph Diffusion Network, CSGDN, to learn robust node representations with fewer training samples to achieve higher link prediction accuracy.
We conduct experiments to validate the performance of CSGDN on three crop datasets: Gossypium hirsutum, Brassica napus, and Triticum turgidum.
arXiv Detail & Related papers (2024-10-10T01:01:10Z) - Generation is better than Modification: Combating High Class Homophily Variance in Graph Anomaly Detection [51.11833609431406]
Homophily distribution differences between different classes are significantly greater than those in homophilic and heterophilic graphs.
We introduce a new metric called Class Homophily Variance, which quantitatively describes this phenomenon.
To mitigate its impact, we propose a novel GNN model named Homophily Edge Generation Graph Neural Network (HedGe)
arXiv Detail & Related papers (2024-03-15T14:26:53Z) - PhenoLinker: Phenotype-Gene Link Prediction and Explanation using
Heterogeneous Graph Neural Networks [38.216545389032234]
We present PhenoLinker, capable of associating a score to a phenotype-gene relationship by using heterogeneous information networks and a convolutional neural network-based model for graphs.
This system can aid in the discovery of new associations and in the understanding of the consequences of human genetic variation.
arXiv Detail & Related papers (2024-02-02T11:35:21Z) - Unsupervised ensemble-based phenotyping helps enhance the
discoverability of genes related to heart morphology [57.25098075813054]
We propose a new framework for gene discovery entitled Un Phenotype Ensembles.
It builds a redundant yet highly expressive representation by pooling a set of phenotypes learned in an unsupervised manner.
These phenotypes are then analyzed via (GWAS), retaining only highly confident and stable associations.
arXiv Detail & Related papers (2023-01-07T18:36:44Z) - Heterogeneous Graph Neural Networks using Self-supervised Reciprocally
Contrastive Learning [102.9138736545956]
Heterogeneous graph neural network (HGNN) is a very popular technique for the modeling and analysis of heterogeneous graphs.
We develop for the first time a novel and robust heterogeneous graph contrastive learning approach, namely HGCL, which introduces two views on respective guidance of node attributes and graph topologies.
In this new approach, we adopt distinct but most suitable attribute and topology fusion mechanisms in the two views, which are conducive to mining relevant information in attributes and topologies separately.
arXiv Detail & Related papers (2022-04-30T12:57:02Z) - Graph-in-Graph (GiG): Learning interpretable latent graphs in
non-Euclidean domain for biological and healthcare applications [52.65389473899139]
Graphs are a powerful tool for representing and analyzing unstructured, non-Euclidean data ubiquitous in the healthcare domain.
Recent works have shown that considering relationships between input data samples have a positive regularizing effect for the downstream task.
We propose Graph-in-Graph (GiG), a neural network architecture for protein classification and brain imaging applications.
arXiv Detail & Related papers (2022-04-01T10:01:37Z) - rfPhen2Gen: A machine learning based association study of brain imaging
phenotypes to genotypes [71.1144397510333]
We learned machine learning models to predict SNPs using 56 brain imaging QTs.
SNPs within the known Alzheimer disease (AD) risk gene APOE had lowest RMSE for lasso and random forest.
Random forests identified additional SNPs that were not prioritized by the linear models but are known to be associated with brain-related disorders.
arXiv Detail & Related papers (2022-03-31T20:15:22Z) - Handling highly correlated genes in prediction analysis of genomic
studies [0.0]
High correlation among genes introduces technical problems, such as multi-collinearity issues, leading to unreliable prediction models.
We propose a grouping algorithm, which treats highly correlated genes as a group and uses their common pattern to represent the group's biological signal in feature selection.
Our proposed grouping method has two advantages. First, using the gene group's common patterns makes the prediction more robust and reliable under condition change.
arXiv Detail & Related papers (2020-07-05T22:14:03Z) - Heterogeneous Graph Neural Networks for Malicious Account Detection [64.0046412312209]
We present GEM, the first heterogeneous graph neural network approach for detecting malicious accounts.
We learn discriminative embeddings from heterogeneous account-device graphs based on two fundamental weaknesses of attackers, i.e. device aggregation and activity aggregation.
Experiments show that our approaches consistently perform promising results compared with competitive methods over time.
arXiv Detail & Related papers (2020-02-27T18:26:44Z) - A Sparse Graph-Structured Lasso Mixed Model for Genetic Association with
Confounding Correction [28.364820868064893]
We propose the sparse graph-structured linear mixed model (sGLMM) that can incorporate the relatedness information from traits in a dataset with confounding correction.
We show that the proposed model outperforms other existing approaches and can model correlation from both population structure and shared signals.
We also discuss the potential causal genetic variation of Human Alzheimer's disease discovered by our model and justify some of the most important genetic loci.
arXiv Detail & Related papers (2017-11-11T16:01:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.