BIOS: An Algorithmically Generated Biomedical Knowledge Graph
- URL: http://arxiv.org/abs/2203.09975v1
- Date: Fri, 18 Mar 2022 14:09:22 GMT
- Title: BIOS: An Algorithmically Generated Biomedical Knowledge Graph
- Authors: Sheng Yu, Zheng Yuan, Jun Xia, Shengxuan Luo, Huaiyuan Ying, Sihang
Zeng, Jingyi Ren, Hongyi Yuan, Zhengyun Zhao, Yucong Lin, Keming Lu, Jing
Wang, Yutao Xie, Heung-Yeung Shum
- Abstract summary: We introduce the Biomedical Informatics Ontology System (BIOS), the first large scale publicly available BioMedKG that is fully generated by machine learning algorithms.
BIOS contains 4.1 million concepts, 7.4 million terms in two languages, and 7.3 million relation triplets.
Results suggest that machine learning-based BioMedKG development is a totally viable solution for replacing traditional expert curation.
- Score: 4.030892610300306
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Biomedical knowledge graphs (BioMedKGs) are essential infrastructures for
biomedical and healthcare big data and artificial intelligence (AI),
facilitating natural language processing, model development, and data exchange.
For many decades, these knowledge graphs have been built via expert curation,
which can no longer catch up with the speed of today's AI development, and a
transition to algorithmically generated BioMedKGs is necessary. In this work,
we introduce the Biomedical Informatics Ontology System (BIOS), the first large
scale publicly available BioMedKG that is fully generated by machine learning
algorithms. BIOS currently contains 4.1 million concepts, 7.4 million terms in
two languages, and 7.3 million relation triplets. We introduce the methodology
for developing BIOS, which covers curation of raw biomedical terms,
computationally identifying synonymous terms and aggregating them to create
concept nodes, semantic type classification of the concepts, relation
identification, and biomedical machine translation. We provide statistics about
the current content of BIOS and perform preliminary assessment for term
quality, synonym grouping, and relation extraction. Results suggest that
machine learning-based BioMedKG development is a totally viable solution for
replacing traditional expert curation.
Related papers
- Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - Know2BIO: A Comprehensive Dual-View Benchmark for Evolving Biomedical
Knowledge Graphs [45.53337864477857]
Know2BIO is a general-purpose heterogeneous KG benchmark for the biomedical domain.
It integrates data from 30 diverse sources, capturing intricate relationships across 11 biomedical categories.
Know2BIO is capable of user-directed automated updating to reflect the latest knowledge in biomedical science.
arXiv Detail & Related papers (2023-10-05T00:34:56Z) - Neurosymbolic AI for Reasoning on Biomedical Knowledge Graphs [0.9085310904484414]
Biomedical datasets are often modeled as knowledge graphs (KGs) because they capture the multi-relational, heterogeneous, and dynamic natures of biomedical systems.
KG completion (KGC), can, therefore, help researchers make predictions to inform tasks like drug repositioning.
arXiv Detail & Related papers (2023-07-17T11:47:05Z) - Exploring the In-context Learning Ability of Large Language Model for
Biomedical Concept Linking [4.8882241537236455]
This research investigates a method that exploits the in-context learning capabilities of large models for biomedical concept linking.
The proposed approach adopts a two-stage retrieve-and-rank framework.
It achieved an accuracy of 90.% in BC5CDR disease entity normalization and 94.7% in chemical entity normalization.
arXiv Detail & Related papers (2023-07-03T16:19:50Z) - LLaVA-Med: Training a Large Language-and-Vision Assistant for
Biomedicine in One Day [85.19963303642427]
We propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images.
The model first learns to align biomedical vocabulary using the figure-caption pairs as is, then learns to master open-ended conversational semantics.
This enables us to train a Large Language and Vision Assistant for BioMedicine in less than 15 hours (with eight A100s)
arXiv Detail & Related papers (2023-06-01T16:50:07Z) - BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks [68.39821375903591]
Generalist AI holds the potential to address limitations due to its versatility in interpreting different data types.
Here, we propose BiomedGPT, the first open-source and lightweight vision-language foundation model.
arXiv Detail & Related papers (2023-05-26T17:14:43Z) - BioGPT: Generative Pre-trained Transformer for Biomedical Text
Generation and Mining [140.61707108174247]
We propose BioGPT, a domain-specific generative Transformer language model pre-trained on large scale biomedical literature.
We get 44.98%, 38.42% and 40.76% F1 score on BC5CDR, KD-DTI and DDI end-to-end relation extraction tasks respectively, and 78.2% accuracy on PubMedQA.
arXiv Detail & Related papers (2022-10-19T07:17:39Z) - CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark [51.38557174322772]
We present the first Chinese Biomedical Language Understanding Evaluation benchmark.
It is a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification.
We report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.
arXiv Detail & Related papers (2021-06-15T12:25:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.