Related papers: GLaDiGAtor: Language-Model-Augmented Multi-Relation Graph Learning for Predicting Disease-Gene Associations

GLaDiGAtor: Language-Model-Augmented Multi-Relation Graph Learning for Predicting Disease-Gene Associations

URL: http://arxiv.org/abs/2602.18769v1
Date: Sat, 21 Feb 2026 09:26:44 GMT
Title: GLaDiGAtor: Language-Model-Augmented Multi-Relation Graph Learning for Predicting Disease-Gene Associations
Authors: Osman Onur Kuzucu, Tunca Doğan,
Abstract summary: Understanding disease-gene associations is essential to unravelling disease mechanisms and advancing diagnostics and therapeutics.<n>To address limitations in existing models, we propose GLaDiGAtor, a novel GNN framework with an encoder-decoder architecture for disease-gene association prediction.<n>GLaDiGAtor constructs a heterogeneous biological graph integrating gene-gene, disease-disease, and gene-disease interactions from curated databases.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding disease-gene associations is essential for unravelling disease mechanisms and advancing diagnostics and therapeutics. Traditional approaches based on manual curation and literature review are labour-intensive and not scalable, prompting the use of machine learning on large biomedical data. In particular, graph neural networks (GNNs) have shown promise for modelling complex biological relationships. To address limitations in existing models, we propose GLaDiGAtor (Graph Learning-bAsed DIsease-Gene AssociaTiOn pRediction), a novel GNN framework with an encoder-decoder architecture for disease-gene association prediction. GLaDiGAtor constructs a heterogeneous biological graph integrating gene-gene, disease-disease, and gene-disease interactions from curated databases, and enriches each node with contextual features from well-known language models (ProtT5 for protein sequences and BioBERT for disease text). In evaluations, our model achieves superior predictive accuracy and generalisation, outperforming 14 existing methods. Literature-supported case studies confirm the biological relevance of high-confidence novel predictions, highlighting GLaDiGAtor's potential to discover candidate disease genes. These results underscore the power of graph convolutional networks in biomedical informatics and may ultimately facilitate drug discovery by revealing new gene-disease links. The source code and processed datasets are publicly available at https://github.com/HUBioDataLab/GLaDiGAtor.

Related papers

engGNN: A Dual-Graph Neural Network for Omics-Based Disease Classification and Feature Selection [5.54797146496798]
Graph Neural Networks (GNNs) offer a promising way to integrate prior knowledge by encoding feature relationships as graphs.<n>We propose the external and generated Graph Neural Network (engGNN), a dual-graph framework that jointly leverages both external known biological networks and data-driven generated graphs.<n>engGNN consistently outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2026-01-20T23:18:56Z)
E-ABIN: an Explainable module for Anomaly detection in BIological Networks [1.7999333451993955]
E-ABIN is a general-purpose, explainable framework for Anomaly detection in Biological Networks.<n>It combines classical machine learning and graph-based deep learning techniques within a unified, user-friendly platform.<n>We demonstrate the utility of E-ABIN through case studies of bladder cancer and coeliac disease.
arXiv Detail & Related papers (2025-06-25T08:25:17Z)
GRAPE: Heterogeneous Graph Representation Learning for Genetic Perturbation with Coding and Non-Coding Biotype [51.58774936662233]
Building gene regulatory networks (GRN) is essential to understand and predict the effects of genetic perturbations.<n>In this work, we leverage pre-trained large language model and DNA sequence model to extract features from gene descriptions and DNA sequence data.<n>We introduce gene biotype information for the first time in genetic perturbation, simulating the distinct roles of genes with different biotypes in regulating cellular processes.
arXiv Detail & Related papers (2025-05-06T03:35:24Z)
A Systematic Evaluation of Knowledge Graph Embeddings for Gene-Disease Association Prediction [0.0]
This work introduces a novel framework for comparing the performance of link prediction versus node-pair classification tasks.<n>It also evaluates the impact of the semantic richness through a disease-specific ontology and additional links between evaluation.<n>Results show that enriching the encoded representation of diseases slightly improves performance, while additional links generate a greater impact.
arXiv Detail & Related papers (2025-04-11T11:11:35Z)
GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters.<n>Trained on an expansive dataset comprising 386B bp of DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks.<n>It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of enhancer sequences with specific activity profiles.
arXiv Detail & Related papers (2025-02-11T05:39:49Z)
VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling [60.91599380893732]
VQDNA is a general-purpose framework that renovates genome tokenization from the perspective of genome vocabulary learning. By leveraging vector-quantized codebooks as learnable vocabulary, VQDNA can adaptively tokenize genomes into pattern-aware embeddings.
arXiv Detail & Related papers (2024-05-13T20:15:03Z)
Applying BioBERT to Extract Germline Gene-Disease Associations for Building a Knowledge Graph from the Biomedical Literature [0.0]
This paper presents SimpleGermKG, an automatic knowledge graph construction approach that connects germline genes and diseases. For the extraction of genes and diseases, we employ BioBERT, a pre-trained BERT model on biomedical corpora. For semantic relationships between articles, genes, and diseases, we implemented a part-whole relation approach. Our knowledge graph contains 297 genes, 130 diseases, and 46,747 triples.
arXiv Detail & Related papers (2023-09-11T18:05:12Z)
Knowledge Graph Completion based on Tensor Decomposition for Disease Gene Prediction [2.838553480267889]
We construct a biological knowledge graph centered on diseases and genes, and develop an end-to-end Knowledge graph completion model for Disease Gene Prediction. KDGene introduces an interaction module between the embeddings of entities and relations to tensor decomposition, which can effectively enhance the information interaction in biological knowledge.
arXiv Detail & Related papers (2023-02-18T13:57:44Z)
Heterogeneous Graph Neural Networks using Self-supervised Reciprocally Contrastive Learning [102.9138736545956]
Heterogeneous graph neural network (HGNN) is a very popular technique for the modeling and analysis of heterogeneous graphs. We develop for the first time a novel and robust heterogeneous graph contrastive learning approach, namely HGCL, which introduces two views on respective guidance of node attributes and graph topologies. In this new approach, we adopt distinct but most suitable attribute and topology fusion mechanisms in the two views, which are conducive to mining relevant information in attributes and topologies separately.
arXiv Detail & Related papers (2022-04-30T12:57:02Z)
G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers. We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z)
Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients. We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks. Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.