Relationship extraction for knowledge graph creation from biomedical
literature
- URL: http://arxiv.org/abs/2201.01647v1
- Date: Wed, 5 Jan 2022 15:09:33 GMT
- Title: Relationship extraction for knowledge graph creation from biomedical
literature
- Authors: Nikola Milosevic, Wolfgang Thielemann
- Abstract summary: We present and compare few rule-based and machine learning-based methods for scalable relationship extraction from biomedical literature.
We examine how resilient are these various methods to unbalanced and fairly small datasets.
The best performing model was T5 model fine-tuned on balanced data, with reported F1-score of 0.88.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Biomedical research is growing in such an exponential pace that scientists,
researchers and practitioners are no more able to cope with the amount of
published literature in the domain. The knowledge presented in the literature
needs to be systematized in such a ways that claims and hypothesis can be
easily found, accessed and validated. Knowledge graphs can provide such
framework for semantic knowledge representation from literature. However, in
order to build knowledge graph, it is necessary to extract knowledge in form of
relationships between biomedical entities and normalize both entities and
relationship types. In this paper, we present and compare few rule-based and
machine learning-based (Naive Bayes, Random Forests as examples of traditional
machine learning methods and T5-based model as an example of modern deep
learning) methods for scalable relationship extraction from biomedical
literature for the integration into the knowledge graphs. We examine how
resilient are these various methods to unbalanced and fairly small datasets,
showing that T5 model handles well both small datasets, due to its pre-training
on large C4 dataset as well as unbalanced data. The best performing model was
T5 model fine-tuned on balanced data, with reported F1-score of 0.88.
Related papers
- Representation-Enhanced Neural Knowledge Integration with Application to Large-Scale Medical Ontology Learning [3.010503480024405]
We propose a theoretically guaranteed statistical framework, called RENKI, to enable simultaneous learning of relation types.
The proposed framework incorporates representation learning output into initial entity embedding of a neural network that approximates the score function for the knowledge graph.
We demonstrate the effect of weighting in the presence of heterogeneous relations and the benefit of incorporating representation learning in nonparametric models.
arXiv Detail & Related papers (2024-10-09T21:38:48Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Graph-Based Retriever Captures the Long Tail of Biomedical Knowledge [2.2814097119704058]
Large language models (LLMs) are transforming the way information is retrieved with vast amounts of knowledge being summarized and presented.
LLMs are prone to highlight the most frequently seen pieces of information from the training set and to neglect the rare ones.
We introduce a novel information-retrieval method that leverages a knowledge graph to downsample these clusters and mitigate the information overload problem.
arXiv Detail & Related papers (2024-02-19T18:31:11Z) - Graph Relation Distillation for Efficient Biomedical Instance
Segmentation [80.51124447333493]
We propose a graph relation distillation approach for efficient biomedical instance segmentation.
We introduce two graph distillation schemes deployed at both the intra-image level and the inter-image level.
Experimental results on a number of biomedical datasets validate the effectiveness of our approach.
arXiv Detail & Related papers (2024-01-12T04:41:23Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - Tertiary Lymphoid Structures Generation through Graph-based Diffusion [54.37503714313661]
In this work, we leverage state-of-the-art graph-based diffusion models to generate biologically meaningful cell-graphs.
We show that the adopted graph diffusion model is able to accurately learn the distribution of cells in terms of their tertiary lymphoid structures (TLS) content.
arXiv Detail & Related papers (2023-10-10T14:37:17Z) - Drug Synergistic Combinations Predictions via Large-Scale Pre-Training
and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation.
Deep learning models have emerged as an efficient way to discover synergistic combinations.
Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z) - Implications of Topological Imbalance for Representation Learning on
Biomedical Knowledge Graphs [16.566710222582618]
We show how knowledge graph embedding models can be affected by structural imbalance.
We show how the graph topology can be perturbed to artificially alter the rank of a gene via random, biologically meaningless information.
arXiv Detail & Related papers (2021-12-13T11:20:36Z) - Scientific Language Models for Biomedical Knowledge Base Completion: An
Empirical Study [62.376800537374024]
We study scientific LMs for KG completion, exploring whether we can tap into their latent knowledge to enhance biomedical link prediction.
We integrate the LM-based models with KG embedding models, using a router method that learns to assign each input example to either type of model and provides a substantial boost in performance.
arXiv Detail & Related papers (2021-06-17T17:55:33Z) - Biomedical Knowledge Graph Refinement and Completion using Graph
Representation Learning and Top-K Similarity Measure [1.4660617536303606]
This work demonstrates learning discrete representations of the integrated biomedical knowledge graph Chem2Bio2RD.
We perform a knowledge graph completion and refinement task using a simple top-K cosine similarity measure between the learned embedding vectors.
arXiv Detail & Related papers (2020-12-18T22:19:57Z) - Benchmark and Best Practices for Biomedical Knowledge Graph Embeddings [8.835844347471626]
We train several state-of-the-art knowledge graph embedding models on the SNOMED-CT knowledge graph.
We make a case for the importance of leveraging the multi-relational nature of knowledge graphs for learning biomedical knowledge representation.
arXiv Detail & Related papers (2020-06-24T14:47:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.