Rewire-then-Probe: A Contrastive Recipe for Probing Biomedical Knowledge
of Pre-trained Language Models
- URL: http://arxiv.org/abs/2110.08173v1
- Date: Fri, 15 Oct 2021 16:00:11 GMT
- Title: Rewire-then-Probe: A Contrastive Recipe for Probing Biomedical Knowledge
of Pre-trained Language Models
- Authors: Zaiqiao Meng, Fangyu Liu, Ehsan Shareghi, Yixuan Su, Charlotte
Collins, Nigel Collier
- Abstract summary: We release a well-curated biomedical knowledge probing benchmark, MedLAMA, based on the Unified Medical Language System (UMLS) Metathesaurus.
We test a wide spectrum of state-of-the-art PLMs and probing approaches on our benchmark, reaching at most 3% of acc@10.
We propose Contrastive-Probe, a novel self-supervised contrastive probing approach, that adjusts the underlying PLMs without using any probing data.
- Score: 16.535312449449165
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge probing is crucial for understanding the knowledge transfer
mechanism behind the pre-trained language models (PLMs). Despite the growing
progress of probing knowledge for PLMs in the general domain, specialised areas
such as biomedical domain are vastly under-explored. To catalyse the research
in this direction, we release a well-curated biomedical knowledge probing
benchmark, MedLAMA, which is constructed based on the Unified Medical Language
System (UMLS) Metathesaurus. We test a wide spectrum of state-of-the-art PLMs
and probing approaches on our benchmark, reaching at most 3% of acc@10. While
highlighting various sources of domain-specific challenges that amount to this
underwhelming performance, we illustrate that the underlying PLMs have a higher
potential for probing tasks. To achieve this, we propose Contrastive-Probe, a
novel self-supervised contrastive probing approach, that adjusts the underlying
PLMs without using any probing data. While Contrastive-Probe pushes the acc@10
to 28%, the performance gap still remains notable. Our human expert evaluation
suggests that the probing performance of our Contrastive-Probe is still
under-estimated as UMLS still does not include the full spectrum of factual
knowledge. We hope MedLAMA and Contrastive-Probe facilitate further
developments of more suited probing techniques for this domain.
Related papers
- LLMs are not Zero-Shot Reasoners for Biomedical Information Extraction [13.965777046473885]
Large Language Models (LLMs) are increasingly adopted for applications in healthcare.
It is unclear how well LLMs perform on tasks that are traditionally pursued in the biomedical domain.
arXiv Detail & Related papers (2024-08-22T09:37:40Z) - Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning [68.83624133567213]
We show that most prevalent MLLMs can be easily fooled by the introduction of a presupposition into the question.
We also propose a simple yet effective method, Active Deduction (AD), to encourage the model to actively perform composite deduction.
arXiv Detail & Related papers (2024-04-19T15:53:27Z) - On-the-fly Definition Augmentation of LLMs for Biomedical NER [28.02028191114401]
LLMs struggle on biomedical NER tasks due to specialized terminology and lack of training data.
We develop a new knowledge augmentation approach which incorporates definitions of relevant concepts on-the-fly.
We find that careful prompting strategies also improve LLM performance, allowing them to outperform fine-tuned language models in few-shot settings.
arXiv Detail & Related papers (2024-03-29T20:59:27Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case
Study in Medicine [89.46836590149883]
We build on a prior study of GPT-4's capabilities on medical challenge benchmarks in the absence of special training.
We find that prompting innovation can unlock deeper specialist capabilities and show that GPT-4 easily tops prior leading results for medical benchmarks.
With Medprompt, GPT-4 achieves state-of-the-art results on all nine of the benchmark datasets in the MultiMedQA suite.
arXiv Detail & Related papers (2023-11-28T03:16:12Z) - Knowledge-injected Prompt Learning for Chinese Biomedical Entity
Normalization [6.927883826415262]
We propose a novel Knowledge-injected Prompt Learning (PL-Knowledge) method to tackle the Biomedical Entity Normalization (BEN) task.
Specifically, our approach consists of five stages: candidate entity matching, knowledge extraction, knowledge encoding, knowledge injection, and prediction output.
By effectively encoding the knowledge items contained in medical entities, the additional knowledge enhances the model's ability to capture latent relationships between medical entities.
arXiv Detail & Related papers (2023-08-23T09:32:40Z) - Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning.
They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health.
Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z) - Large Language Models, scientific knowledge and factuality: A framework to streamline human expert evaluation [0.0]
This work explores the potential of Large Language Models for dialoguing with biomedical background knowledge.
The framework involves of three evaluation steps, each assessing different aspects sequentially: fluency, prompt alignment, semantic coherence, factual knowledge, and specificity of the generated responses.
The work provides a systematic assessment on the ability of eleven state-of-the-art models LLMs, including ChatGPT, GPT-4 and Llama 2, in two prompting-based tasks.
arXiv Detail & Related papers (2023-05-28T22:46:21Z) - Knowledge Rumination for Pre-trained Language Models [77.55888291165462]
We propose a new paradigm dubbed Knowledge Rumination to help the pre-trained language model utilize related latent knowledge without retrieving it from the external corpus.
We apply the proposed knowledge rumination to various language models, including RoBERTa, DeBERTa, and GPT-3.
arXiv Detail & Related papers (2023-05-15T15:47:09Z) - Context Variance Evaluation of Pretrained Language Models for
Prompt-based Biomedical Knowledge Probing [9.138354194112395]
We show that prompt-based probing methods can only probe a lower bound of knowledge.
We introduce context variance into the prompt generation and propose a new rank-change-based evaluation metric.
arXiv Detail & Related papers (2022-11-18T14:44:09Z) - Self-Supervised Knowledge Assimilation for Expert-Layman Text Style
Transfer [63.72621204057025]
Expert-layman text style transfer technologies have the potential to improve communication between scientific communities and the general public.
High-quality information produced by experts is often filled with difficult jargon laypeople struggle to understand.
This is a particularly notable issue in the medical domain, where layman are often confused by medical text online.
arXiv Detail & Related papers (2021-10-06T17:57:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.