Benchmark datasets for biomedical knowledge graphs with negative
statements
- URL: http://arxiv.org/abs/2307.11719v1
- Date: Fri, 21 Jul 2023 17:25:21 GMT
- Title: Benchmark datasets for biomedical knowledge graphs with negative
statements
- Authors: Rita T. Sousa, Sara Silva, Catia Pesquita
- Abstract summary: We present a collection of datasets for three relation prediction tasks that aim at circumventing the difficulties in building benchmarks for knowledge graphs with negative statements.
We also generate knowledge graph embeddings each dataset with two popular path-based methods and evaluate the performance in each task.
The results show that the negative statements can improve the performance of knowledge graph embeddings.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge graphs represent facts about real-world entities. Most of these
facts are defined as positive statements. The negative statements are scarce
but highly relevant under the open-world assumption. Furthermore, they have
been demonstrated to improve the performance of several applications, namely in
the biomedical domain. However, no benchmark dataset supports the evaluation of
the methods that consider these negative statements.
We present a collection of datasets for three relation prediction tasks -
protein-protein interaction prediction, gene-disease association prediction and
disease prediction - that aim at circumventing the difficulties in building
benchmarks for knowledge graphs with negative statements. These datasets
include data from two successful biomedical ontologies, Gene Ontology and Human
Phenotype Ontology, enriched with negative statements.
We also generate knowledge graph embeddings for each dataset with two popular
path-based methods and evaluate the performance in each task. The results show
that the negative statements can improve the performance of knowledge graph
embeddings.
Related papers
- The Role of Graph Topology in the Performance of Biomedical Knowledge Graph Completion Models [3.1666540219908272]
We conduct a comprehensive investigation into the properties of publicly available biomedical Knowledge Graphs.
We establish links to the accuracy observed in real-world applications.
We release all model predictions and a new suite of analysis tools.
arXiv Detail & Related papers (2024-09-06T08:09:15Z) - The Heterophilic Graph Learning Handbook: Benchmarks, Models, Theoretical Analysis, Applications and Challenges [101.83124435649358]
Homophily principle, ie nodes with the same labels or similar attributes are more likely to be connected.
Recent work has identified a non-trivial set of datasets where GNN's performance compared to the NN's is not satisfactory.
arXiv Detail & Related papers (2024-07-12T18:04:32Z) - Biomedical Knowledge Graph Embeddings with Negative Statements [1.7778609937758327]
Explicitly considering negative statements has been shown to improve performance on tasks such as entity summarization.
We propose a novel approach, TrueWalks, to incorporate negative statements into the knowledge graph representation learning process.
We present a novel walk-generation method that is able to not only differentiate between positive and negative statements but also take into account the semantic implications of negation.
arXiv Detail & Related papers (2023-08-07T10:08:25Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - Heterogeneous Graph Neural Networks using Self-supervised Reciprocally
Contrastive Learning [102.9138736545956]
Heterogeneous graph neural network (HGNN) is a very popular technique for the modeling and analysis of heterogeneous graphs.
We develop for the first time a novel and robust heterogeneous graph contrastive learning approach, namely HGCL, which introduces two views on respective guidance of node attributes and graph topologies.
In this new approach, we adopt distinct but most suitable attribute and topology fusion mechanisms in the two views, which are conducive to mining relevant information in attributes and topologies separately.
arXiv Detail & Related papers (2022-04-30T12:57:02Z) - Graph-in-Graph (GiG): Learning interpretable latent graphs in
non-Euclidean domain for biological and healthcare applications [52.65389473899139]
Graphs are a powerful tool for representing and analyzing unstructured, non-Euclidean data ubiquitous in the healthcare domain.
Recent works have shown that considering relationships between input data samples have a positive regularizing effect for the downstream task.
We propose Graph-in-Graph (GiG), a neural network architecture for protein classification and brain imaging applications.
arXiv Detail & Related papers (2022-04-01T10:01:37Z) - Implications of Topological Imbalance for Representation Learning on
Biomedical Knowledge Graphs [16.566710222582618]
We show how knowledge graph embedding models can be affected by structural imbalance.
We show how the graph topology can be perturbed to artificially alter the rank of a gene via random, biologically meaningless information.
arXiv Detail & Related papers (2021-12-13T11:20:36Z) - Biomedical Knowledge Graph Refinement and Completion using Graph
Representation Learning and Top-K Similarity Measure [1.4660617536303606]
This work demonstrates learning discrete representations of the integrated biomedical knowledge graph Chem2Bio2RD.
We perform a knowledge graph completion and refinement task using a simple top-K cosine similarity measure between the learned embedding vectors.
arXiv Detail & Related papers (2020-12-18T22:19:57Z) - Handling Missing Data with Graph Representation Learning [62.59831675688714]
We propose GRAPE, a graph-based framework for feature imputation as well as label prediction.
Under GRAPE, the feature imputation is formulated as an edge-level prediction task and the label prediction as a node-level prediction task.
Experimental results on nine benchmark datasets show that GRAPE yields 20% lower mean absolute error for imputation tasks and 10% lower for label prediction tasks.
arXiv Detail & Related papers (2020-10-30T17:59:13Z) - Temporal Positive-unlabeled Learning for Biomedical Hypothesis
Generation via Risk Estimation [46.852387038668695]
This paper aims to introduce the use of machine learning to the scientific process of hypothesis generation.
We propose a variational inference model to estimate the positive prior, and incorporate it in the learning of node pair embeddings.
Experiment results on real-world biomedical term relationship datasets and case study analyses on a COVID-19 dataset validate the effectiveness of the proposed model.
arXiv Detail & Related papers (2020-10-05T10:58:03Z) - A Survey of Adversarial Learning on Graphs [59.21341359399431]
We investigate and summarize the existing works on graph adversarial learning tasks.
Specifically, we survey and unify the existing works w.r.t. attack and defense in graph analysis tasks.
We emphasize the importance of related evaluation metrics, investigate and summarize them comprehensively.
arXiv Detail & Related papers (2020-03-10T12:48:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.