Extracting Biomedical Factual Knowledge Using Pretrained Language Model
and Electronic Health Record Context
- URL: http://arxiv.org/abs/2209.07859v1
- Date: Fri, 26 Aug 2022 00:01:26 GMT
- Title: Extracting Biomedical Factual Knowledge Using Pretrained Language Model
and Electronic Health Record Context
- Authors: Zonghai Yao, Yi Cao, Zhichao Yang, Vijeta Deshpande, Hong Yu
- Abstract summary: We use prompt methods to extract knowledge from Language Models (LMs) as new knowledge Bases (LMs as KBs)
We specifically add EHR notes as context to the prompt to improve the low bound in the biomedical domain.
Our experiments show that the knowledge possessed by those language models can distinguish the correct knowledge from the noise knowledge in the EHR notes.
- Score: 7.7971830917251275
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language Models (LMs) have performed well on biomedical natural language
processing applications. In this study, we conducted some experiments to use
prompt methods to extract knowledge from LMs as new knowledge Bases (LMs as
KBs). However, prompting can only be used as a low bound for knowledge
extraction, and perform particularly poorly on biomedical domain KBs. In order
to make LMs as KBs more in line with the actual application scenarios of the
biomedical domain, we specifically add EHR notes as context to the prompt to
improve the low bound in the biomedical domain. We design and validate a series
of experiments for our Dynamic-Context-BioLAMA task. Our experiments show that
the knowledge possessed by those language models can distinguish the correct
knowledge from the noise knowledge in the EHR notes, and such distinguishing
ability can also be used as a new metric to evaluate the amount of knowledge
possessed by the model.
Related papers
- Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning.
They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health.
Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z) - Large Language Models, scientific knowledge and factuality: A framework to streamline human expert evaluation [0.0]
This work explores the potential of Large Language Models for dialoguing with biomedical background knowledge.
The framework involves of three evaluation steps, each assessing different aspects sequentially: fluency, prompt alignment, semantic coherence, factual knowledge, and specificity of the generated responses.
The work provides a systematic assessment on the ability of eleven state-of-the-art models LLMs, including ChatGPT, GPT-4 and Llama 2, in two prompting-based tasks.
arXiv Detail & Related papers (2023-05-28T22:46:21Z) - Can Language Models be Biomedical Knowledge Bases? [18.28724653601921]
We create the BioLAMA benchmark comprised of 49K biomedical factual knowledge triples for probing biomedical LMs.
We find that biomedical LMs with recently proposed probing methods can achieve up to 18.51% Acc@5 on retrieving biomedical knowledge.
arXiv Detail & Related papers (2021-09-15T08:34:56Z) - Scientific Language Models for Biomedical Knowledge Base Completion: An
Empirical Study [62.376800537374024]
We study scientific LMs for KG completion, exploring whether we can tap into their latent knowledge to enhance biomedical link prediction.
We integrate the LM-based models with KG embedding models, using a router method that learns to assign each input example to either type of model and provides a substantial boost in performance.
arXiv Detail & Related papers (2021-06-17T17:55:33Z) - CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark [51.38557174322772]
We present the first Chinese Biomedical Language Understanding Evaluation benchmark.
It is a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification.
We report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.
arXiv Detail & Related papers (2021-06-15T12:25:30Z) - Boosting Low-Resource Biomedical QA via Entity-Aware Masking Strategies [25.990479833023166]
Biomedical question-answering (QA) has gained increased attention for its capability to provide users with high-quality information from a vast scientific literature.
We propose a simple yet unexplored approach, which we call biomedical entity-aware masking (BEM)
We encourage masked language models to learn entity-centric knowledge based on the pivotal entities characterizing the domain at hand, and employ those entities to drive the LM fine-tuning. Experimental results show performance on par with state-of-the-art models on several biomedical QA datasets.
arXiv Detail & Related papers (2021-02-16T18:51:13Z) - UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual
Embeddings Using the Unified Medical Language System Metathesaurus [73.86656026386038]
We introduce UmlsBERT, a contextual embedding model that integrates domain knowledge during the pre-training process.
By applying these two strategies, UmlsBERT can encode clinical domain knowledge into word embeddings and outperform existing domain-specific models.
arXiv Detail & Related papers (2020-10-20T15:56:31Z) - Domain-Specific Language Model Pretraining for Biomedical Natural
Language Processing [73.37262264915739]
We show that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains.
Our experiments show that domain-specific pretraining serves as a solid foundation for a wide range of biomedical NLP tasks.
arXiv Detail & Related papers (2020-07-31T00:04:15Z) - Benchmark and Best Practices for Biomedical Knowledge Graph Embeddings [8.835844347471626]
We train several state-of-the-art knowledge graph embedding models on the SNOMED-CT knowledge graph.
We make a case for the importance of leveraging the multi-relational nature of knowledge graphs for learning biomedical knowledge representation.
arXiv Detail & Related papers (2020-06-24T14:47:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.