Related papers: Scientific Language Models for Biomedical Knowledge Base Completion: An Empirical Study

Scientific Language Models for Biomedical Knowledge Base Completion: An Empirical Study

URL: http://arxiv.org/abs/2106.09700v1
Date: Thu, 17 Jun 2021 17:55:33 GMT
Title: Scientific Language Models for Biomedical Knowledge Base Completion: An Empirical Study
Authors: Rahul Nadkarni, David Wadden, Iz Beltagy, Noah A. Smith, Hannaneh Hajishirzi, Tom Hope
Abstract summary: We study scientific LMs for KG completion, exploring whether we can tap into their latent knowledge to enhance biomedical link prediction. We integrate the LM-based models with KG embedding models, using a router method that learns to assign each input example to either type of model and provides a substantial boost in performance.
Score: 62.376800537374024
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Biomedical knowledge graphs (KGs) hold rich information on entities such as diseases, drugs, and genes. Predicting missing links in these graphs can boost many important applications, such as drug design and repurposing. Recent work has shown that general-domain language models (LMs) can serve as "soft" KGs, and that they can be fine-tuned for the task of KG completion. In this work, we study scientific LMs for KG completion, exploring whether we can tap into their latent knowledge to enhance biomedical link prediction. We evaluate several domain-specific LMs, fine-tuning them on datasets centered on drugs and diseases that we represent as KGs and enrich with textual entity descriptions. We integrate the LM-based models with KG embedding models, using a router method that learns to assign each input example to either type of model and provides a substantial boost in performance. Finally, we demonstrate the advantage of LM models in the inductive setting with novel scientific entities. Our datasets and code are made publicly available.

Related papers

DrKGC: Dynamic Subgraph Retrieval-Augmented LLMs for Knowledge Graph Completion across General and Biomedical Domains [13.63225871556018]
Knowledge graph completion (KGC) aims to predict missing triples in knowledge graphs (KGs) by leveraging existing triples and textual information.<n>DrKGC employs a flexible lightweight model training strategy to learn structural embeddings and logical rules within the KG.<n>It then leverages a novel bottom-up graph retrieval method to extract a subgraph for each query guided by the learned rules.
arXiv Detail & Related papers (2025-05-31T20:56:54Z)
Biomedical Foundation Model: A Survey [84.26268124754792]
Foundation models are large-scale pre-trained models that learn from extensive unlabeled datasets. These models can be adapted to various applications such as question answering and visual understanding. This survey explores the potential of foundation models across diverse domains within biomedical fields.
arXiv Detail & Related papers (2025-03-03T22:42:00Z)
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature [73.39593644054865]
BIOMEDICA is a scalable, open-source framework to extract, annotate, and serialize the entirety of the PubMed Central Open Access subset into an easy-to-use, publicly accessible dataset. Our framework produces a comprehensive archive with over 24 million unique image-text pairs from over 6 million articles. BMCA-CLIP is a suite of CLIP-style models continuously pretrained on the BIOMEDICA dataset via streaming, eliminating the need to download 27 TB of data locally.
arXiv Detail & Related papers (2025-01-13T09:58:03Z)
Multimodal Contrastive Representation Learning in Augmented Biomedical Knowledge Graphs [2.006175707670159]
PrimeKG++ is an enriched knowledge graph incorporating multimodal data. Our approach demonstrates strong generalizability, enabling accurate link predictions even for unseen nodes.
arXiv Detail & Related papers (2025-01-03T05:29:12Z)
LLaVA Needs More Knowledge: Retrieval Augmented Natural Language Generation with Knowledge Graph for Explaining Thoracic Pathologies [3.2221734920470797]
We propose a Vision-Language framework augmented with a Knowledge Graph (KG)-based datastore to generate Natural Language Explanations (NLEs) for medical images. Our framework employs a KG-based retrieval mechanism that not only improves the precision of the generated explanations but also preserves data privacy by avoiding direct data retrieval. These frameworks are validated on the MIMIC-NLE dataset, where they achieve state-of-the-art results.
arXiv Detail & Related papers (2024-10-07T04:59:08Z)
The Role of Graph Topology in the Performance of Biomedical Knowledge Graph Completion Models [3.1666540219908272]
We conduct a comprehensive investigation into the properties of publicly available biomedical Knowledge Graphs. We establish links to the accuracy observed in real-world applications. We release all model predictions and a new suite of analysis tools.
arXiv Detail & Related papers (2024-09-06T08:09:15Z)
Diversifying Knowledge Enhancement of Biomedical Language Models using Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models. We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT. We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z)
A Review on Knowledge Graphs for Healthcare: Resources, Applications, and Promises [52.31710895034573]
This work provides the first comprehensive review of healthcare knowledge graphs (HKGs) It summarizes the pipeline and key techniques for HKG construction, as well as the common utilization approaches. At the application level, we delve into the successful integration of HKGs across various health domains.
arXiv Detail & Related papers (2023-06-07T21:51:56Z)
KG-Hub -- Building and Exchanging Biological Knowledge Graphs [0.5369297590461578]
KG-Hub is a platform that enables standardized construction, exchange, and reuse of knowledge graphs. Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research.
arXiv Detail & Related papers (2023-01-31T21:29:35Z)
Large Language Models for Biomedical Knowledge Graph Construction: Information extraction from EMR notes [0.0]
We propose an end-to-end machine learning solution based on large language models (LLMs) The entities used in the KG construction process are diseases, factors, treatments, as well as manifestations that coexist with the patient while experiencing the disease. The application of the proposed methodology is demonstrated on age-related macular degeneration.
arXiv Detail & Related papers (2023-01-29T15:52:33Z)
Deep Bidirectional Language-Knowledge Graph Pretraining [159.9645181522436]
DRAGON is a self-supervised approach to pretraining a deeply joint language-knowledge foundation model from text and KG at scale. Our model takes pairs of text segments and relevant KG subgraphs as input and bidirectionally fuses information from both modalities.
arXiv Detail & Related papers (2022-10-17T18:02:52Z)
BertNet: Harvesting Knowledge Graphs with Arbitrary Relations from Pretrained Language Models [65.51390418485207]
We propose a new approach of harvesting massive KGs of arbitrary relations from pretrained LMs. With minimal input of a relation definition, the approach efficiently searches in the vast entity pair space to extract diverse accurate knowledge. We deploy the approach to harvest KGs of over 400 new relations from different LMs.
arXiv Detail & Related papers (2022-06-28T19:46:29Z)
SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization [64.56399911605286]
We propose SumGNN: knowledge summarization graph neural network, which is enabled by a subgraph extraction module. SumGNN outperforms the best baseline by up to 5.54%, and the performance gain is particularly significant in low data relation types.
arXiv Detail & Related papers (2020-10-04T00:14:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.