Related papers: Evaluating Prompting Strategies with MedGemma for Medical Order Extraction

Evaluating Prompting Strategies with MedGemma for Medical Order Extraction

URL: http://arxiv.org/abs/2511.10583v1
Date: Fri, 14 Nov 2025 01:58:57 GMT
Title: Evaluating Prompting Strategies with MedGemma for Medical Order Extraction
Authors: Abhinand Balachandran, Bavana Durgapraveen, Gowsikkan Sikkan Sudhagar, Vidhya Varshany J S, Sriram Rajkumar,
Abstract summary: This paper details our team submission to the MEDIQA-OE-2025 Shared Task.<n>We investigate the performance of MedGemma, a new domain-specific open-source language model, for structured order extraction.
Score: 6.312830352605384
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The accurate extraction of medical orders from doctor-patient conversations is a critical task for reducing clinical documentation burdens and ensuring patient safety. This paper details our team submission to the MEDIQA-OE-2025 Shared Task. We investigate the performance of MedGemma, a new domain-specific open-source language model, for structured order extraction. We systematically evaluate three distinct prompting paradigms: a straightforward one-Shot approach, a reasoning-focused ReAct framework, and a multi-step agentic workflow. Our experiments reveal that while more complex frameworks like ReAct and agentic flows are powerful, the simpler one-shot prompting method achieved the highest performance on the official validation set. We posit that on manually annotated transcripts, complex reasoning chains can lead to "overthinking" and introduce noise, making a direct approach more robust and efficient. Our work provides valuable insights into selecting appropriate prompting strategies for clinical information extraction in varied data conditions.

Related papers

MedAlign: A Synergistic Framework of Multimodal Preference Optimization and Federated Meta-Cognitive Reasoning [52.064286116035134]
We develop MedAlign, a framework to ensure visually accurate LVLM responses for Medical Visual Question Answering (Med-VQA)<n>We first propose a multimodal Direct Preference Optimization (mDPO) objective to align preference learning with visual context.<n>We then design a Retrieval-Aware Mixture-of-Experts (RA-MoE) architecture that utilizes image and text similarity to route queries to a specialized and context-augmented LVLM.
arXiv Detail & Related papers (2025-10-24T02:11:05Z)
Automated Clinical Problem Detection from SOAP Notes using a Collaborative Multi-Agent LLM Architecture [8.072932739333309]
We introduce a collaborative multi-agent system (MAS) that models a clinical consultation team to address this gap.<n>The system is tasked with identifying clinical problems by analyzing only the Subjective (S) and Objective (O) sections of SOAP notes.<n>A Manager agent orchestrates a dynamically assigned team of specialist agents who engage in a hierarchical, iterative debate to reach a consensus.
arXiv Detail & Related papers (2025-08-29T17:31:24Z)
GEMeX-RMCoT: An Enhanced Med-VQA Dataset for Region-Aware Multimodal Chain-of-Thought Reasoning [60.03671205298294]
Medical visual question answering aims to support clinical decision-making by enabling models to answer natural language questions based on medical images.<n>Current methods still suffer from limited answer reliability and poor interpretability.<n>This work first proposes a Region-Aware Multimodal Chain-of-Thought dataset, in which the process of producing an answer is preceded by a sequence of intermediate reasoning steps.
arXiv Detail & Related papers (2025-06-22T08:09:58Z)
Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making [80.94208848596215]
We present a new concept called Catfish Agent, a role-specialized LLM designed to inject structured dissent and counter silent agreement.<n>Inspired by the catfish effect'' in organizational psychology, the Catfish Agent is designed to challenge emerging consensus to stimulate deeper reasoning.
arXiv Detail & Related papers (2025-05-27T17:59:50Z)
MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks [27.717720332927296]
We introduce MedAgentBoard, a comprehensive benchmark for the systematic evaluation of multi-agent collaboration, single-LLM, and conventional approaches.<n> MedAgentBoard encompasses four diverse medical task categories: medical (visual) question answering, lay summary generation, structured Electronic Health Record (EHR) predictive modeling, and clinical workflow automation.<n>Our extensive experiments reveal a nuanced landscape: while multi-agent collaboration demonstrates benefits in specific scenarios, it does not consistently outperform advanced single LLMs.
arXiv Detail & Related papers (2025-05-18T11:28:17Z)
Task as Context Prompting for Accurate Medical Symptom Coding Using Large Language Models [11.510561099220872]
Task as Context (TACO) Prompting is a novel framework that unifies extraction and linking tasks by embedding task-specific context into prompts.<n>Our study also introduces SYMPCODER, a human-annotated dataset derived from Vaccine Adverse Event Reporting System (VAERS) reports.
arXiv Detail & Related papers (2025-04-03T21:57:17Z)
SurgRAW: Multi-Agent Workflow with Chain-of-Thought Reasoning for Surgical Intelligence [16.584722724845182]
Integration of Vision-Language Models in surgical intelligence is hindered by hallucinations, domain knowledge gaps, and limited understanding of task interdependencies.<n>We present SurgRAW, a CoT-driven multi-agent framework that delivers transparent, interpretable insights for most tasks in robotic-assisted surgery.
arXiv Detail & Related papers (2025-03-13T11:23:13Z)
Medchain: Bridging the Gap Between LLM Agents and Clinical Practice with Interactive Sequence [68.05876437208505]
We present MedChain, a dataset of 12,163 clinical cases that covers five key stages of clinical workflow.<n>We also propose MedChain-Agent, an AI system that integrates a feedback mechanism and a MCase-RAG module to learn from previous cases and adapt its responses.
arXiv Detail & Related papers (2024-12-02T15:25:02Z)
Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answering [24.43605359639671]
We propose a modified version of the MedQA-USMLE dataset, named MEDQA-OPEN. It contains open-ended medical questions without options to mimic clinical scenarios, along with clinician-approved reasoned answers. We implement a prompt driven by Chain of Thought (CoT) reasoning, CLINICR, to mirror the prospective process of incremental reasoning.
arXiv Detail & Related papers (2024-03-07T20:48:40Z)
Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning. They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health. Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z)
Generating medically-accurate summaries of patient-provider dialogue: A multi-stage approach using large language models [6.252236971703546]
An effective summary is required to be coherent and accurately capture all the medically relevant information in the dialogue. This paper tackles the problem of medical conversation summarization by discretizing the task into several smaller dialogue-understanding tasks.
arXiv Detail & Related papers (2023-05-10T08:48:53Z)
Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching. We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders. We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.