Related papers: LLMs are not Zero-Shot Reasoners for Biomedical Information Extraction

LLMs are not Zero-Shot Reasoners for Biomedical Information Extraction

URL: http://arxiv.org/abs/2408.12249v1
Date: Thu, 22 Aug 2024 09:37:40 GMT
Title: LLMs are not Zero-Shot Reasoners for Biomedical Information Extraction
Authors: Aishik Nagar, Viktor Schlegel, Thanh-Tung Nguyen, Hao Li, Yuping Wu, Kuluhan Binici, Stefan Winkler,
Abstract summary: Large Language Models (LLMs) are increasingly adopted for applications in healthcare. It is unclear how well LLMs perform on tasks that are traditionally pursued in the biomedical domain.
Score: 13.965777046473885
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are increasingly adopted for applications in healthcare, reaching the performance of domain experts on tasks such as question answering and document summarisation. Despite their success on these tasks, it is unclear how well LLMs perform on tasks that are traditionally pursued in the biomedical domain, such as structured information extration. To breach this gap, in this paper, we systematically benchmark LLM performance in Medical Classification and Named Entity Recognition (NER) tasks. We aim to disentangle the contribution of different factors to the performance, particularly the impact of LLMs' task knowledge and reasoning capabilities, their (parametric) domain knowledge, and addition of external knowledge. To this end we evaluate various open LLMs -- including BioMistral and Llama-2 models -- on a diverse set of biomedical datasets, using standard prompting, Chain-of-Thought (CoT) and Self-Consistency based reasoning as well as Retrieval-Augmented Generation (RAG) with PubMed and Wikipedia corpora. Counter-intuitively, our results reveal that standard prompting consistently outperforms more complex techniques across both tasks, laying bare the limitations in the current application of CoT, self-consistency and RAG in the biomedical domain. Our findings suggest that advanced prompting methods developed for knowledge- or reasoning-intensive tasks, such as CoT or RAG, are not easily portable to biomedical tasks where precise structured outputs are required. This highlights the need for more effective integration of external knowledge and reasoning mechanisms in LLMs to enhance their performance in real-world biomedical applications.

Related papers

Unveiling Knowledge Utilization Mechanisms in LLM-based Retrieval-Augmented Generation [77.10390725623125]
retrieval-augmented generation (RAG) is widely employed to expand their knowledge scope.<n>Since RAG has shown promise in knowledge-intensive tasks like open-domain question answering, its broader application to complex tasks and intelligent assistants has further advanced its utility.<n>We present a systematic investigation of the intrinsic mechanisms by which RAGs integrate internal (parametric) and external (retrieved) knowledge.
arXiv Detail & Related papers (2025-05-17T13:13:13Z)
Towards Artificial Intelligence Research Assistant for Expert-Involved Learning [64.7438151207189]
Large Language Models (LLMs) and Large Multi-Modal Models (LMMs) have emerged as transformative tools in scientific research.<n>We present textbfARtificial textbfIntelligence research assistant for textbfExpert-involved textbfLearning (ARIEL)
arXiv Detail & Related papers (2025-05-03T14:21:48Z)
Injecting Domain-Specific Knowledge into Large Language Models: A Comprehensive Survey [39.82566660592583]
Large Language Models (LLMs) have demonstrated remarkable success in various tasks such as natural language understanding, text summarization, and machine translation. Their general-purpose nature often limits their effectiveness in domain-specific applications that require specialized knowledge, such as healthcare, chemistry, or legal analysis. To address this, researchers have explored diverse methods to enhance LLMs by integrating domain-specific knowledge.
arXiv Detail & Related papers (2025-02-15T07:43:43Z)
Knowledge Hierarchy Guided Biological-Medical Dataset Distillation for Domain LLM Training [10.701353329227722]
We propose a framework that automates the distillation of high-quality textual training data from the extensive scientific literature. Our approach self-evaluates and generates questions that are more closely aligned with the biomedical domain. Our approach substantially improves question-answering tasks compared to pre-trained models from the life sciences domain.
arXiv Detail & Related papers (2025-01-25T07:20:44Z)
A Review on Scientific Knowledge Extraction using Large Language Models in Biomedical Sciences [1.8308043661908204]
This paper reviews the state-of-the-art applications of large language models (LLMs) in the biomedical domain. LLMs demonstrate remarkable potential, but significant challenges remain, including issues related to hallucinations, contextual understanding, and the ability to generalize. We aim to improve access to medical literature and facilitate meaningful discoveries in healthcare.
arXiv Detail & Related papers (2024-12-04T18:26:13Z)
Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering [70.44269982045415]
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs) We introduce Medical Retrieval-Augmented Generation Benchmark (MedRGB) that provides various supplementary elements to four medical QA datasets. Our experimental results reveals current models' limited ability to handle noise and misinformation in the retrieved documents.
arXiv Detail & Related papers (2024-11-14T06:19:18Z)
Demystifying Large Language Models for Medicine: A Primer [50.83806796466396]
Large language models (LLMs) represent a transformative class of AI tools capable of revolutionizing various aspects of healthcare. This tutorial aims to equip healthcare professionals with the tools necessary to effectively integrate LLMs into clinical practice.
arXiv Detail & Related papers (2024-10-24T15:41:56Z)
Mitigating Hallucinations of Large Language Models in Medical Information Extraction via Contrastive Decoding [92.32881381717594]
We introduce ALternate Contrastive Decoding (ALCD) to solve hallucination issues in medical information extraction tasks. ALCD demonstrates significant improvements in resolving hallucination issues compared to conventional decoding methods.
arXiv Detail & Related papers (2024-10-21T07:19:19Z)
Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs [64.9693406713216]
Internal mechanisms that contribute to the effectiveness of RAG systems remain underexplored. Our experiments reveal that several core groups of experts are primarily responsible for RAG-related behaviors. We propose several strategies to enhance RAG's efficiency and effectiveness through expert activation.
arXiv Detail & Related papers (2024-10-20T16:08:54Z)
Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement [51.601916604301685]
Large language models (LLMs) generate content that can undermine trust in online discourse. Current methods often focus on binary classification, failing to address the complexities of real-world scenarios like human-AI collaboration. To move beyond binary classification and address these challenges, we propose a new paradigm for detecting LLM-generated content.
arXiv Detail & Related papers (2024-10-18T08:14:10Z)
A Survey for Large Language Models in Biomedicine [31.719451674137844]
This review is based on an analysis of 484 publications sourced from databases including PubMed, Web of Science, and arXiv. We explore the capabilities of LLMs in zero-shot learning across a broad spectrum of biomedical tasks, including diagnostic assistance, drug discovery, and personalized medicine. We discuss the challenges that LLMs face in the biomedicine domain including data privacy concerns, limited model interpretability, issues with dataset quality, and ethics.
arXiv Detail & Related papers (2024-08-29T12:39:16Z)
SeRTS: Self-Rewarding Tree Search for Biomedical Retrieval-Augmented Generation [50.26966969163348]
Large Language Models (LLMs) have shown great potential in the biomedical domain with the advancement of retrieval-augmented generation (RAG) Existing retrieval-augmented approaches face challenges in addressing diverse queries and documents, particularly for medical knowledge queries. We propose Self-Rewarding Tree Search (SeRTS) based on Monte Carlo Tree Search (MCTS) and a self-rewarding paradigm.
arXiv Detail & Related papers (2024-06-17T06:48:31Z)
Large Language Models Illuminate a Progressive Pathway to Artificial Healthcare Assistant: A Review [16.008511195589925]
Large language models (LLMs) have shown promising capabilities in mimicking human-level language comprehension and reasoning. This paper provides a comprehensive review on the applications and implications of LLMs in medicine.
arXiv Detail & Related papers (2023-11-03T13:51:36Z)
Inspire the Large Language Model by External Knowledge on BioMedical Named Entity Recognition [3.427366431933441]
Large language models (LLMs) have demonstrated dominating performance in many NLP tasks, especially on generative tasks. We leverage the LLM to solve the Biomedical NER task into entity span extraction and entity type determination. Experimental results show a significant improvement in our two-step BioNER approach compared to previous few-shot LLM baseline.
arXiv Detail & Related papers (2023-09-21T17:39:53Z)
Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning. They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health. Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z)
Large Language Models, scientific knowledge and factuality: A framework to streamline human expert evaluation [0.0]
This work explores the potential of Large Language Models for dialoguing with biomedical background knowledge. The framework involves of three evaluation steps, each assessing different aspects sequentially: fluency, prompt alignment, semantic coherence, factual knowledge, and specificity of the generated responses. The work provides a systematic assessment on the ability of eleven state-of-the-art models LLMs, including ChatGPT, GPT-4 and Llama 2, in two prompting-based tasks.
arXiv Detail & Related papers (2023-05-28T22:46:21Z)
Large Language Models for Biomedical Knowledge Graph Construction: Information extraction from EMR notes [0.0]
We propose an end-to-end machine learning solution based on large language models (LLMs) The entities used in the KG construction process are diseases, factors, treatments, as well as manifestations that coexist with the patient while experiencing the disease. The application of the proposed methodology is demonstrated on age-related macular degeneration.
arXiv Detail & Related papers (2023-01-29T15:52:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.