BioBLP: A Modular Framework for Learning on Multimodal Biomedical
Knowledge Graphs
- URL: http://arxiv.org/abs/2306.03606v1
- Date: Tue, 6 Jun 2023 11:49:38 GMT
- Title: BioBLP: A Modular Framework for Learning on Multimodal Biomedical
Knowledge Graphs
- Authors: Daniel Daza, Dimitrios Alivanistos, Payal Mitra, Thom Pijnenburg,
Michael Cochez, Paul Groth
- Abstract summary: We propose a modular framework for learning embeddings in knowledge graphs.
It allows encoding attribute data of different modalities while also supporting entities with missing attributes.
We train models using a biomedical KG containing approximately 2 million triples.
- Score: 3.780924717521521
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge graphs (KGs) are an important tool for representing complex
relationships between entities in the biomedical domain. Several methods have
been proposed for learning embeddings that can be used to predict new links in
such graphs. Some methods ignore valuable attribute data associated with
entities in biomedical KGs, such as protein sequences, or molecular graphs.
Other works incorporate such data, but assume that entities can be represented
with the same data modality. This is not always the case for biomedical KGs,
where entities exhibit heterogeneous modalities that are central to their
representation in the subject domain.
We propose a modular framework for learning embeddings in KGs with entity
attributes, that allows encoding attribute data of different modalities while
also supporting entities with missing attributes. We additionally propose an
efficient pretraining strategy for reducing the required training runtime. We
train models using a biomedical KG containing approximately 2 million triples,
and evaluate the performance of the resulting entity embeddings on the tasks of
link prediction, and drug-protein interaction prediction, comparing against
methods that do not take attribute data into account. In the standard link
prediction evaluation, the proposed method results in competitive, yet lower
performance than baselines that do not use attribute data. When evaluated in
the task of drug-protein interaction prediction, the method compares favorably
with the baselines. We find settings involving low degree entities, which make
up for a substantial amount of the set of entities in the KG, where our method
outperforms the baselines. Our proposed pretraining strategy yields
significantly higher performance while reducing the required training runtime.
Our implementation is available at https://github.com/elsevier-AI-Lab/BioBLP .
Related papers
- Extracting Protein-Protein Interactions (PPIs) from Biomedical
Literature using Attention-based Relational Context Information [5.456047952635665]
This work presents a unified, multi-source PPI corpora with vetted interaction definitions augmented by binary interaction type labels.
A Transformer-based deep learning method exploits entities' relational context information for relation representation to improve relation classification performance.
The model's performance is evaluated on four widely studied biomedical relation extraction datasets.
arXiv Detail & Related papers (2024-03-08T01:43:21Z) - Pseudo Label-Guided Data Fusion and Output Consistency for
Semi-Supervised Medical Image Segmentation [9.93871075239635]
We propose the PLGDF framework, which builds upon the mean teacher network for segmenting medical images with less annotation.
We propose a novel pseudo-label utilization scheme, which combines labeled and unlabeled data to augment the dataset effectively.
Our framework yields superior performance compared to six state-of-the-art semi-supervised learning methods.
arXiv Detail & Related papers (2023-11-17T06:36:43Z) - BioREx: Improving Biomedical Relation Extraction by Leveraging
Heterogeneous Datasets [7.7587371896752595]
Biomedical relation extraction (RE) is a central task in biomedical natural language processing (NLP) research.
We present a novel framework for systematically addressing the data heterogeneity of individual datasets and combining them into a large dataset.
Our evaluation shows that BioREx achieves significantly higher performance than the benchmark system trained on the individual dataset.
arXiv Detail & Related papers (2023-06-19T22:48:18Z) - Learnable Weight Initialization for Volumetric Medical Image Segmentation [66.3030435676252]
We propose a learnable weight-based hybrid medical image segmentation approach.
Our approach is easy to integrate into any hybrid model and requires no external training data.
Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-15T17:55:05Z) - Drug Synergistic Combinations Predictions via Large-Scale Pre-Training
and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation.
Deep learning models have emerged as an efficient way to discover synergistic combinations.
Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z) - Tyger: Task-Type-Generic Active Learning for Molecular Property
Prediction [121.97742787439546]
How to accurately predict the properties of molecules is an essential problem in AI-driven drug discovery.
To reduce annotation cost, deep Active Learning methods are developed to select only the most representative and informative data for annotating.
We propose a Task-type-generic active learning framework (termed Tyger) that is able to handle different types of learning tasks in a unified manner.
arXiv Detail & Related papers (2022-05-23T12:56:12Z) - Slot Filling for Biomedical Information Extraction [0.5330240017302619]
We present a slot filling approach to the task of biomedical IE.
We follow the proposed paradigm of coupling a Tranformer-based bi-encoder, Dense Passage Retrieval, with a Transformer-based reader model.
arXiv Detail & Related papers (2021-09-17T14:16:00Z) - EchoEA: Echo Information between Entities and Relations for Entity
Alignment [1.1470070927586016]
We propose a novel framework, Echo Entity Alignment (EchoEA), which leverages self-attention mechanism to spread entity information to relations and echo back to entities.
The experimental results on three real-world cross-lingual datasets are stable at around 96% at hits@1 on average.
arXiv Detail & Related papers (2021-07-07T07:34:21Z) - Scientific Language Models for Biomedical Knowledge Base Completion: An
Empirical Study [62.376800537374024]
We study scientific LMs for KG completion, exploring whether we can tap into their latent knowledge to enhance biomedical link prediction.
We integrate the LM-based models with KG embedding models, using a router method that learns to assign each input example to either type of model and provides a substantial boost in performance.
arXiv Detail & Related papers (2021-06-17T17:55:33Z) - SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge
Graph Summarization [64.56399911605286]
We propose SumGNN: knowledge summarization graph neural network, which is enabled by a subgraph extraction module.
SumGNN outperforms the best baseline by up to 5.54%, and the performance gain is particularly significant in low data relation types.
arXiv Detail & Related papers (2020-10-04T00:14:57Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.