VitaGraph: Building a Knowledge Graph for Biologically Relevant Learning Tasks
- URL: http://arxiv.org/abs/2505.11185v1
- Date: Fri, 16 May 2025 12:43:04 GMT
- Title: VitaGraph: Building a Knowledge Graph for Biologically Relevant Learning Tasks
- Authors: Francesco Madeddu, Lucia Testa, Gianluca De Carlo, Michele Pieroni, Andrea Mastropietro, Aris Anagnostopoulos, Paolo Tieri, Sergio Barbarossa,
- Abstract summary: We present a comprehensive biological knowledge graph constructed by integrating and refining multiple publicly available datasets.<n>The resulting resource represents a coherent and reliable biological knowledge graph that serves as a state-of-the-art platform to advance research in computational biology and precision medicine.
- Score: 8.962235896860294
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The intrinsic complexity of human biology presents ongoing challenges to scientific understanding. Researchers collaborate across disciplines to expand our knowledge of the biological interactions that define human life. AI methodologies have emerged as powerful tools across scientific domains, particularly in computational biology, where graph data structures effectively model biological entities such as protein-protein interaction (PPI) networks and gene functional networks. Those networks are used as datasets for paramount network medicine tasks, such as gene-disease association prediction, drug repurposing, and polypharmacy side effect studies. Reliable predictions from machine learning models require high-quality foundational data. In this work, we present a comprehensive multi-purpose biological knowledge graph constructed by integrating and refining multiple publicly available datasets. Building upon the Drug Repurposing Knowledge Graph (DRKG), we define a pipeline tasked with a) cleaning inconsistencies and redundancies present in DRKG, b) coalescing information from the main available public data sources, and c) enriching the graph nodes with expressive feature vectors such as molecular fingerprints and gene ontologies. Biologically and chemically relevant features improve the capacity of machine learning models to generate accurate and well-structured embedding spaces. The resulting resource represents a coherent and reliable biological knowledge graph that serves as a state-of-the-art platform to advance research in computational biology and precision medicine. Moreover, it offers the opportunity to benchmark graph-based machine learning and network medicine models on relevant tasks. We demonstrate the effectiveness of the proposed dataset by benchmarking it against the task of drug repurposing, PPI prediction, and side-effect prediction, modeled as link prediction problems.
Related papers
- Multimodal Contrastive Representation Learning in Augmented Biomedical Knowledge Graphs [2.006175707670159]
PrimeKG++ is an enriched knowledge graph incorporating multimodal data.<n>Our approach demonstrates strong generalizability, enabling accurate link predictions even for unseen nodes.
arXiv Detail & Related papers (2025-01-03T05:29:12Z) - Causal Representation Learning from Multimodal Biomedical Observations [57.00712157758845]
We develop flexible identification conditions for multimodal data and principled methods to facilitate the understanding of biomedical datasets.<n>Key theoretical contribution is the structural sparsity of causal connections between modalities.<n>Results on a real-world human phenotype dataset are consistent with established biomedical research.
arXiv Detail & Related papers (2024-11-10T16:40:27Z) - BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments [112.25067497985447]
We introduce BioDiscoveryAgent, an agent that designs new experiments, reasons about their outcomes, and efficiently navigates the hypothesis space to reach desired solutions.<n>BioDiscoveryAgent can uniquely design new experiments without the need to train a machine learning model.<n>It achieves an average of 21% improvement in predicting relevant genetic perturbations across six datasets.
arXiv Detail & Related papers (2024-05-27T19:57:17Z) - Extracting Protein-Protein Interactions (PPIs) from Biomedical
Literature using Attention-based Relational Context Information [5.456047952635665]
This work presents a unified, multi-source PPI corpora with vetted interaction definitions augmented by binary interaction type labels.
A Transformer-based deep learning method exploits entities' relational context information for relation representation to improve relation classification performance.
The model's performance is evaluated on four widely studied biomedical relation extraction datasets.
arXiv Detail & Related papers (2024-03-08T01:43:21Z) - Graph Relation Distillation for Efficient Biomedical Instance
Segmentation [80.51124447333493]
We propose a graph relation distillation approach for efficient biomedical instance segmentation.
We introduce two graph distillation schemes deployed at both the intra-image level and the inter-image level.
Experimental results on a number of biomedical datasets validate the effectiveness of our approach.
arXiv Detail & Related papers (2024-01-12T04:41:23Z) - Tertiary Lymphoid Structures Generation through Graph-based Diffusion [54.37503714313661]
In this work, we leverage state-of-the-art graph-based diffusion models to generate biologically meaningful cell-graphs.
We show that the adopted graph diffusion model is able to accurately learn the distribution of cells in terms of their tertiary lymphoid structures (TLS) content.
arXiv Detail & Related papers (2023-10-10T14:37:17Z) - Knowledge-augmented Graph Machine Learning for Drug Discovery: A Survey [6.288056740658763]
Graph Machine Learning (GML) has gained considerable attention for its exceptional ability to model graph-structured biomedical data.
Recent studies have proposed integrating external biomedical knowledge into the GML pipeline to realise more precise and interpretable drug discovery.
arXiv Detail & Related papers (2023-02-16T12:38:01Z) - Predicting Biomedical Interactions with Probabilistic Model Selection
for Graph Neural Networks [5.156812030122437]
Current biological networks are noisy, sparse, and incomplete. Experimental identification of such interactions is both time-consuming and expensive.
Deep graph neural networks have shown their effectiveness in modeling graph-structured data and achieved good performance in biomedical interaction prediction.
Our proposed method enables the graph convolutional networks to dynamically adapt their depths to accommodate an increasing number of interactions.
arXiv Detail & Related papers (2022-11-22T20:44:28Z) - Contrastive Brain Network Learning via Hierarchical Signed Graph Pooling
Model [64.29487107585665]
Graph representation learning techniques on brain functional networks can facilitate the discovery of novel biomarkers for clinical phenotypes and neurodegenerative diseases.
Here, we propose an interpretable hierarchical signed graph representation learning model to extract graph-level representations from brain functional networks.
In order to further improve the model performance, we also propose a new strategy to augment functional brain network data for contrastive learning.
arXiv Detail & Related papers (2022-07-14T20:03:52Z) - Graph-in-Graph (GiG): Learning interpretable latent graphs in
non-Euclidean domain for biological and healthcare applications [52.65389473899139]
Graphs are a powerful tool for representing and analyzing unstructured, non-Euclidean data ubiquitous in the healthcare domain.
Recent works have shown that considering relationships between input data samples have a positive regularizing effect for the downstream task.
We propose Graph-in-Graph (GiG), a neural network architecture for protein classification and brain imaging applications.
arXiv Detail & Related papers (2022-04-01T10:01:37Z) - Scientific Language Models for Biomedical Knowledge Base Completion: An
Empirical Study [62.376800537374024]
We study scientific LMs for KG completion, exploring whether we can tap into their latent knowledge to enhance biomedical link prediction.
We integrate the LM-based models with KG embedding models, using a router method that learns to assign each input example to either type of model and provides a substantial boost in performance.
arXiv Detail & Related papers (2021-06-17T17:55:33Z) - Biomedical Knowledge Graph Refinement and Completion using Graph
Representation Learning and Top-K Similarity Measure [1.4660617536303606]
This work demonstrates learning discrete representations of the integrated biomedical knowledge graph Chem2Bio2RD.
We perform a knowledge graph completion and refinement task using a simple top-K cosine similarity measure between the learned embedding vectors.
arXiv Detail & Related papers (2020-12-18T22:19:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.