Know2BIO: A Comprehensive Dual-View Benchmark for Evolving Biomedical
Knowledge Graphs
- URL: http://arxiv.org/abs/2310.03221v1
- Date: Thu, 5 Oct 2023 00:34:56 GMT
- Title: Know2BIO: A Comprehensive Dual-View Benchmark for Evolving Biomedical
Knowledge Graphs
- Authors: Yijia Xiao, Dylan Steinecke, Alexander Russell Pelletier, Yushi Bai,
Peipei Ping, Wei Wang
- Abstract summary: Know2BIO is a general-purpose heterogeneous KG benchmark for the biomedical domain.
It integrates data from 30 diverse sources, capturing intricate relationships across 11 biomedical categories.
Know2BIO is capable of user-directed automated updating to reflect the latest knowledge in biomedical science.
- Score: 45.53337864477857
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge graphs (KGs) have emerged as a powerful framework for representing
and integrating complex biomedical information. However, assembling KGs from
diverse sources remains a significant challenge in several aspects, including
entity alignment, scalability, and the need for continuous updates to keep pace
with scientific advancements. Moreover, the representative power of KGs is
often limited by the scarcity of multi-modal data integration. To overcome
these challenges, we propose Know2BIO, a general-purpose heterogeneous KG
benchmark for the biomedical domain. Know2BIO integrates data from 30 diverse
sources, capturing intricate relationships across 11 biomedical categories. It
currently consists of ~219,000 nodes and ~6,200,000 edges. Know2BIO is capable
of user-directed automated updating to reflect the latest knowledge in
biomedical science. Furthermore, Know2BIO is accompanied by multi-modal data:
node features including text descriptions, protein and compound sequences and
structures, enabling the utilization of emerging natural language processing
methods and multi-modal data integration strategies. We evaluate KG
representation models on Know2BIO, demonstrating its effectiveness as a
benchmark for KG representation learning in the biomedical field. Data and
source code of Know2BIO are available at
https://github.com/Yijia-Xiao/Know2BIO/.
Related papers
- Leveraging Biomolecule and Natural Language through Multi-Modal
Learning: A Survey [75.47055414002571]
The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology.
We provide an analysis of recent advancements achieved through cross modeling of biomolecules and natural language.
arXiv Detail & Related papers (2024-03-03T14:59:47Z) - BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning [77.90250740041411]
This paper introduces BioT5+, an extension of the BioT5 framework, tailored to enhance biological research and drug discovery.
BioT5+ incorporates several novel features: integration of IUPAC names for molecular understanding, inclusion of extensive bio-text and molecule data from sources like bioRxiv and PubChem, the multi-task instruction tuning for generality across tasks, and a numerical tokenization technique for improved processing of numerical data.
arXiv Detail & Related papers (2024-02-27T12:43:09Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - From Large Language Models to Knowledge Graphs for Biomarker Discovery
in Cancer [0.9437165725355702]
A challenging scenarios for artificial intelligence (AI) is using biomedical data to provide diagnosis and treatment recommendations for cancerous conditions.
A large-scale knowledge graph (KG) can be constructed by integrating and extracting facts about semantically interrelated entities and relations.
In this paper, we develop a domain KG to leverage cancer-specific biomarker discovery and interactive QA.
arXiv Detail & Related papers (2023-10-12T14:36:13Z) - BioT5: Enriching Cross-modal Integration in Biology with Chemical
Knowledge and Natural Language Associations [54.97423244799579]
$mathbfBioT5$ is a pre-training framework that enriches cross-modal integration in biology with chemical knowledge and natural language associations.
$mathbfBioT5$ distinguishes between structured and unstructured knowledge, leading to more effective utilization of information.
arXiv Detail & Related papers (2023-10-11T07:57:08Z) - Biomedical Multi-hop Question Answering Using Knowledge Graph Embeddings
and Language Models [0.0]
We have created a multi-hop biomedical question-answering dataset in natural language for testing the biomedical multi-hop question-answering system.
The major contribution of this research is an integrated system that combines language models with KG embeddings to give highly relevant answers to free-form questions.
arXiv Detail & Related papers (2022-11-10T05:43:57Z) - BigBIO: A Framework for Data-Centric Biomedical Natural Language
Processing [13.30221348538759]
We introduce BigBIO, a community library of 126+ biomedical NLP datasets.
BigBIO facilitates reproducible meta-dataset curation via programmatic access to datasets and their metadata.
We discuss our process for task schema, data auditing, contribution guidelines, and outline two illustrative use cases.
arXiv Detail & Related papers (2022-06-30T07:15:45Z) - BIOS: An Algorithmically Generated Biomedical Knowledge Graph [4.030892610300306]
We introduce the Biomedical Informatics Ontology System (BIOS), the first large scale publicly available BioMedKG that is fully generated by machine learning algorithms.
BIOS contains 4.1 million concepts, 7.4 million terms in two languages, and 7.3 million relation triplets.
Results suggest that machine learning-based BioMedKG development is a totally viable solution for replacing traditional expert curation.
arXiv Detail & Related papers (2022-03-18T14:09:22Z) - Scientific Language Models for Biomedical Knowledge Base Completion: An
Empirical Study [62.376800537374024]
We study scientific LMs for KG completion, exploring whether we can tap into their latent knowledge to enhance biomedical link prediction.
We integrate the LM-based models with KG embedding models, using a router method that learns to assign each input example to either type of model and provides a substantial boost in performance.
arXiv Detail & Related papers (2021-06-17T17:55:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.