MedPAO: A Protocol-Driven Agent for Structuring Medical Reports
- URL: http://arxiv.org/abs/2510.04623v1
- Date: Mon, 06 Oct 2025 09:32:23 GMT
- Title: MedPAO: A Protocol-Driven Agent for Structuring Medical Reports
- Authors: Shrish Shrinath Vaidya, Gowthamaan Palani, Sidharth Ramesh, Velmurugan Balasubramanian, Minmini Selvam, Gokulraja Srinivasaraja, Ganapathy Krishnamurthi,
- Abstract summary: We introduce MedPAO, a novel agentic framework that ensures accuracy and verifiable reasoning.<n> MedPAO decomposes the report structuring task into a transparent process managed by a Plan-Act-Observe (PAO) loop and specialized tools.<n>The efficacy of our approach is demonstrated through rigorous evaluation: MedPAO achieves an F1-score of 0.96 on the critical sub-task of concept categorization.
- Score: 0.13029689752120577
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The deployment of Large Language Models (LLMs) for structuring clinical data is critically hindered by their tendency to hallucinate facts and their inability to follow domain-specific rules. To address this, we introduce MedPAO, a novel agentic framework that ensures accuracy and verifiable reasoning by grounding its operation in established clinical protocols such as the ABCDEF protocol for CXR analysis. MedPAO decomposes the report structuring task into a transparent process managed by a Plan-Act-Observe (PAO) loop and specialized tools. This protocol-driven method provides a verifiable alternative to opaque, monolithic models. The efficacy of our approach is demonstrated through rigorous evaluation: MedPAO achieves an F1-score of 0.96 on the critical sub-task of concept categorization. Notably, expert radiologists and clinicians rated the final structured outputs with an average score of 4.52 out of 5, indicating a level of reliability that surpasses baseline approaches relying solely on LLM-based foundation models. The code is available at: https://github.com/MiRL-IITM/medpao-agent
Related papers
- MedAD-R1: Eliciting Consistent Reasoning in Interpretible Medical Anomaly Detection via Consistency-Reinforced Policy Optimization [46.65200216642429]
We introduce MedAD-38K, the first large-scale, multi-modal, and multi-center benchmark for MedAD featuring diagnostic Chain-of-Thought (CoT) annotations alongside structured Visual Question-Answering (VQA) pairs.<n>Our proposed model, MedAD-R1, achieves state-of-the-art (SOTA) performance on the MedAD-38K benchmark, outperforming strong baselines by more than 10%.
arXiv Detail & Related papers (2026-02-01T07:56:10Z) - A Federated and Parameter-Efficient Framework for Large Language Model Training in Medicine [59.78991974851707]
Large language models (LLMs) have demonstrated strong performance on medical benchmarks, including question answering and diagnosis.<n>Most medical LLMs are trained on data from a single institution, which faces limitations in generalizability and safety in heterogeneous systems.<n>We introduce the model-agnostic and parameter-efficient federated learning framework for adapting LLMs to medical applications.
arXiv Detail & Related papers (2026-01-29T18:48:21Z) - Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models [122.58252919699122]
Mechanistic Interpretability (MI) has emerged as a vital approach to demystify the decision-making of Large Language Models (LLMs)<n>We present a practical survey structured around the pipeline: "Awesomeinterventionable-MI-Survey"
arXiv Detail & Related papers (2026-01-20T14:23:23Z) - MRG-R1: Reinforcement Learning for Clinically Aligned Medical Report Generation [23.22547135801011]
We propose a semantic-driven reinforcement learning (SRL) method for medical report generation.<n>SRL encourages clinical-correctness-guided learning beyond imitation of language style.<n>We evaluate Medical Report Generation with SRL on two datasets: IU X-Ray and MIMIC-CXR.
arXiv Detail & Related papers (2025-12-18T03:57:55Z) - Can Molecular Foundation Models Know What They Don't Know? A Simple Remedy with Preference Optimization [54.22711328577149]
We introduce Molecular-Aligned Preference Instance Ranking (Mole-PAIR), a plug-and-play module that can be flexibly integrated with existing foundation models.<n>We show that our approach significantly improves the OOD detection capabilities of existing molecular foundation models.
arXiv Detail & Related papers (2025-09-29T21:06:52Z) - An Automated Retrieval-Augmented Generation LLaMA-4 109B-based System for Evaluating Radiotherapy Treatment Plans [2.2532577733932038]
We develop a retrieval-augmented generation (RAG) system powered by LLaMA-4 109B for automated, protocol-aware, and interpretable evaluation of radiotherapy treatment plans.<n>RAG system integrates three core modules: a retrieval engine optimized across five SentenceTransformer backbones, a percentile prediction component based on cohort similarity, and a clinical constraint checker.
arXiv Detail & Related papers (2025-09-25T03:18:31Z) - Model selection meets clinical semantics: Optimizing ICD-10-CM prediction via LLM-as-Judge evaluation, redundancy-aware sampling, and section-aware fine-tuning [1.208527102371119]
We propose a modular framework for ICD-10 Clinical Modification (ICD-10-CM) code prediction.<n>It addresses the challenges through principled model selection, redundancy-aware data sampling, and structured input design.<n>The proposed framework provides a scalable, institution-ready solution for real-world deployment of automated medical coding systems.
arXiv Detail & Related papers (2025-09-23T09:35:05Z) - Medical AI Consensus: A Multi-Agent Framework for Radiology Report Generation and Evaluation [0.2039123720459736]
We introduce a multi-agent reinforcement learning framework that serves as a benchmark and evaluation environment for multimodal clinical reasoning in the radiology ecosystem.<n>The proposed framework integrates large language models (LLMs) and large vision models (LVMs) within a modular architecture composed of ten specialized agents responsible for image analysis, feature extraction, report generation, review, and evaluation.
arXiv Detail & Related papers (2025-09-22T04:31:27Z) - An Agentic Model Context Protocol Framework for Medical Concept Standardization [5.12407270785129]
We develop a zero-training, hallucination-preventive mapping system based on the Model Context Protocol (MCP)<n>The system enables explainable mapping and significantly improves efficiency and accuracy with minimal effort.
arXiv Detail & Related papers (2025-09-04T02:32:22Z) - When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs [55.20230501807337]
We present the first systematic evaluation of 5 methods for improving prompt robustness within a unified experimental framework.<n>We benchmark these techniques on 8 models from Llama, Qwen and Gemma families across 52 tasks from Natural Instructions dataset.
arXiv Detail & Related papers (2025-08-15T10:32:50Z) - LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment.<n>We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews.<n>Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z) - How Well Can Modern LLMs Act as Agent Cores in Radiology Environments? [54.36730060680139]
RadA-BenchPlat is an evaluation platform that benchmarks the performance of large language models (LLMs) in radiology environments.<n>The platform also defines ten categories of tools for agent-driven task solving and evaluates seven leading LLMs.
arXiv Detail & Related papers (2024-12-12T18:20:16Z) - XAI for In-hospital Mortality Prediction via Multimodal ICU Data [57.73357047856416]
We propose an efficient, explainable AI solution for predicting in-hospital mortality via multimodal ICU data.
We employ multimodal learning in our framework, which can receive heterogeneous inputs from clinical data and make decisions.
Our framework can be easily transferred to other clinical tasks, which facilitates the discovery of crucial factors in healthcare research.
arXiv Detail & Related papers (2023-12-29T14:28:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.