Related papers: Rewire-then-Probe: A Contrastive Recipe for Probing Biomedical Knowledge of Pre-trained Language Models

Rewire-then-Probe: A Contrastive Recipe for Probing Biomedical Knowledge of Pre-trained Language Models

URL: http://arxiv.org/abs/2110.08173v1
Date: Fri, 15 Oct 2021 16:00:11 GMT
Title: Rewire-then-Probe: A Contrastive Recipe for Probing Biomedical Knowledge of Pre-trained Language Models
Authors: Zaiqiao Meng, Fangyu Liu, Ehsan Shareghi, Yixuan Su, Charlotte Collins, Nigel Collier
Abstract summary: We release a well-curated biomedical knowledge probing benchmark, MedLAMA, based on the Unified Medical Language System (UMLS) Metathesaurus. We test a wide spectrum of state-of-the-art PLMs and probing approaches on our benchmark, reaching at most 3% of acc@10. We propose Contrastive-Probe, a novel self-supervised contrastive probing approach, that adjusts the underlying PLMs without using any probing data.
Score: 16.535312449449165
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Knowledge probing is crucial for understanding the knowledge transfer mechanism behind the pre-trained language models (PLMs). Despite the growing progress of probing knowledge for PLMs in the general domain, specialised areas such as biomedical domain are vastly under-explored. To catalyse the research in this direction, we release a well-curated biomedical knowledge probing benchmark, MedLAMA, which is constructed based on the Unified Medical Language System (UMLS) Metathesaurus. We test a wide spectrum of state-of-the-art PLMs and probing approaches on our benchmark, reaching at most 3% of acc@10. While highlighting various sources of domain-specific challenges that amount to this underwhelming performance, we illustrate that the underlying PLMs have a higher potential for probing tasks. To achieve this, we propose Contrastive-Probe, a novel self-supervised contrastive probing approach, that adjusts the underlying PLMs without using any probing data. While Contrastive-Probe pushes the acc@10 to 28%, the performance gap still remains notable. Our human expert evaluation suggests that the probing performance of our Contrastive-Probe is still under-estimated as UMLS still does not include the full spectrum of factual knowledge. We hope MedLAMA and Contrastive-Probe facilitate further developments of more suited probing techniques for this domain.

Related papers

DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding [61.26026947423187]
Human experts excel at fine-grained visual discrimination by leveraging domain knowledge to refine perceptual features. Current Multimodal Large Language Models (MLLMs) struggle to integrate reasoning into visual perception. We propose DeepPerception, an MLLM enhanced with cognitive visual perception capabilities.
arXiv Detail & Related papers (2025-03-17T04:06:34Z)
Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions. We propose a novel approach utilizing structured medical reasoning. Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z)
Survey on AI-Generated Media Detection: From Non-MLLM to MLLM [51.91311158085973]
Methods for detecting AI-generated media have evolved rapidly. General-purpose detectors based on MLLMs integrate authenticity verification, explainability, and localization capabilities. Ethical and security considerations have emerged as critical global concerns.
arXiv Detail & Related papers (2025-02-07T12:18:20Z)
LLMs are not Zero-Shot Reasoners for Biomedical Information Extraction [13.965777046473885]
Large Language Models (LLMs) are increasingly adopted for applications in healthcare. It is unclear how well LLMs perform on tasks that are traditionally pursued in the biomedical domain.
arXiv Detail & Related papers (2024-08-22T09:37:40Z)
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning [68.83624133567213]
We show that most prevalent MLLMs can be easily fooled by the introduction of a presupposition into the question. We also propose a simple yet effective method, Active Deduction (AD), to encourage the model to actively perform composite deduction.
arXiv Detail & Related papers (2024-04-19T15:53:27Z)
On-the-fly Definition Augmentation of LLMs for Biomedical NER [28.02028191114401]
LLMs struggle on biomedical NER tasks due to specialized terminology and lack of training data. We develop a new knowledge augmentation approach which incorporates definitions of relevant concepts on-the-fly. We find that careful prompting strategies also improve LLM performance, allowing them to outperform fine-tuned language models in few-shot settings.
arXiv Detail & Related papers (2024-03-29T20:59:27Z)
Diversifying Knowledge Enhancement of Biomedical Language Models using Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models. We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT. We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z)
Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine [89.46836590149883]
We build on a prior study of GPT-4's capabilities on medical challenge benchmarks in the absence of special training. We find that prompting innovation can unlock deeper specialist capabilities and show that GPT-4 easily tops prior leading results for medical benchmarks. With Medprompt, GPT-4 achieves state-of-the-art results on all nine of the benchmark datasets in the MultiMedQA suite.
arXiv Detail & Related papers (2023-11-28T03:16:12Z)
Knowledge-injected Prompt Learning for Chinese Biomedical Entity Normalization [6.927883826415262]
We propose a novel Knowledge-injected Prompt Learning (PL-Knowledge) method to tackle the Biomedical Entity Normalization (BEN) task. Specifically, our approach consists of five stages: candidate entity matching, knowledge extraction, knowledge encoding, knowledge injection, and prediction output. By effectively encoding the knowledge items contained in medical entities, the additional knowledge enhances the model's ability to capture latent relationships between medical entities.
arXiv Detail & Related papers (2023-08-23T09:32:40Z)
Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning. They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health. Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z)
Large Language Models, scientific knowledge and factuality: A framework to streamline human expert evaluation [0.0]
This work explores the potential of Large Language Models for dialoguing with biomedical background knowledge. The framework involves of three evaluation steps, each assessing different aspects sequentially: fluency, prompt alignment, semantic coherence, factual knowledge, and specificity of the generated responses. The work provides a systematic assessment on the ability of eleven state-of-the-art models LLMs, including ChatGPT, GPT-4 and Llama 2, in two prompting-based tasks.
arXiv Detail & Related papers (2023-05-28T22:46:21Z)
Knowledge Rumination for Pre-trained Language Models [77.55888291165462]
We propose a new paradigm dubbed Knowledge Rumination to help the pre-trained language model utilize related latent knowledge without retrieving it from the external corpus. We apply the proposed knowledge rumination to various language models, including RoBERTa, DeBERTa, and GPT-3.
arXiv Detail & Related papers (2023-05-15T15:47:09Z)
Context Variance Evaluation of Pretrained Language Models for Prompt-based Biomedical Knowledge Probing [9.138354194112395]
We show that prompt-based probing methods can only probe a lower bound of knowledge. We introduce context variance into the prompt generation and propose a new rank-change-based evaluation metric.
arXiv Detail & Related papers (2022-11-18T14:44:09Z)
Self-Supervised Knowledge Assimilation for Expert-Layman Text Style Transfer [63.72621204057025]
Expert-layman text style transfer technologies have the potential to improve communication between scientific communities and the general public. High-quality information produced by experts is often filled with difficult jargon laypeople struggle to understand. This is a particularly notable issue in the medical domain, where layman are often confused by medical text online.
arXiv Detail & Related papers (2021-10-06T17:57:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.