Related papers: MedMMV: A Controllable Multimodal Multi-Agent Framework for Reliable and Verifiable Clinical Reasoning

MedMMV: A Controllable Multimodal Multi-Agent Framework for Reliable and Verifiable Clinical Reasoning

URL: http://arxiv.org/abs/2509.24314v1
Date: Mon, 29 Sep 2025 05:51:25 GMT
Title: MedMMV: A Controllable Multimodal Multi-Agent Framework for Reliable and Verifiable Clinical Reasoning
Authors: Hongjun Liu, Yinghao Zhu, Yuhui Wang, Yitao Long, Zeyu Lai, Lequan Yu, Chen Zhao,
Abstract summary: We introduce MedMMV, a controllable multi-agent framework for reliable and verifiable clinical reasoning.<n>On six medical benchmarks, MedMMV improves accuracy by up to 12.7% and, more critically, demonstrates superior reliability.
Score: 35.97057940590796
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent progress in multimodal large language models (MLLMs) has demonstrated promising performance on medical benchmarks and in preliminary trials as clinical assistants. Yet, our pilot audit of diagnostic cases uncovers a critical failure mode: instability in early evidence interpretation precedes hallucination, creating branching reasoning trajectories that cascade into globally inconsistent conclusions. This highlights the need for clinical reasoning agents that constrain stochasticity and hallucination while producing auditable decision flows. We introduce MedMMV, a controllable multimodal multi-agent framework for reliable and verifiable clinical reasoning. MedMMV stabilizes reasoning through diversified short rollouts, grounds intermediate steps in a structured evidence graph under the supervision of a Hallucination Detector, and aggregates candidate paths with a Combined Uncertainty scorer. On six medical benchmarks, MedMMV improves accuracy by up to 12.7% and, more critically, demonstrates superior reliability. Blind physician evaluations confirm that MedMMV substantially increases reasoning truthfulness without sacrificing informational content. By controlling instability through a verifiable, multi-agent process, our framework provides a robust path toward deploying trustworthy AI systems in high-stakes domains like clinical decision support.

Related papers

Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification [60.18369393468405]
Existing verifiers usually underperform owing to a lack of domain knowledge and limited calibration.<n>GLEAN compiles expert-curated protocols into trajectory-informed, well-calibrated correctness signals.<n>We empirically validate GLEAN with agentic clinical diagnosis across three diseases from the MIMIC-IV dataset.
arXiv Detail & Related papers (2026-03-03T09:36:43Z)
CARE: Towards Clinical Accountability in Multi-Modal Medical Reasoning with an Evidence-Grounded Agentic Framework [29.22693846221723]
We introduce CARE, advancing Clinical Accountability in multi-modal medical Reasoning with an Evidence-grounded agentic framework.<n> CARE decomposes the task into coordinated sub-modules to reduce shortcut learning and hallucination.<n>Our CARE-Flow improves average accuracy by 10.9% over the same size (10B) state-of-the-art (SOTA)
arXiv Detail & Related papers (2026-03-02T08:38:37Z)
A Multi-Agent Framework for Medical AI: Leveraging Fine-Tuned GPT, LLaMA, and DeepSeek R1 for Evidence-Based and Bias-Aware Clinical Query Processing [0.4349324020366305]
Large language models (LLMs) show promise for healthcare question answering, but clinical use is limited by weak verification, insufficient evidence grounding, and unreliable confidence signalling.<n>We propose a multi-agent medical QA framework that combines complementary LLMs with evidence retrieval, uncertainty estimation, and bias checks to improve answer reliability.
arXiv Detail & Related papers (2026-02-15T14:17:27Z)
Strong Reasoning Isn't Enough: Evaluating Evidence Elicitation in Interactive Diagnosis [29.630872344186873]
Interactive medical consultation requires an agent to proactively elicit missing clinical evidence under uncertainty.<n>Existing evaluations largely remain static or outcome-centric, neglecting the evidence-gathering process.<n>We propose an interactive evaluation framework that explicitly models the consultation process using a simulated patient and a revsimulated reporter grounded in atomic evidences.
arXiv Detail & Related papers (2026-01-27T16:36:35Z)
AgentsEval: Clinically Faithful Evaluation of Medical Imaging Reports via Multi-Agent Reasoning [73.50200033931148]
We introduce AgentsEval, a multi-agent stream reasoning framework that emulates the collaborative diagnostic workflow of radiologists.<n>By dividing the evaluation process into interpretable steps including criteria definition, evidence extraction, alignment, and consistency scoring, AgentsEval provides explicit reasoning traces and structured clinical feedback.<n> Experimental results demonstrate that AgentsEval delivers clinically aligned, semantically faithful, and interpretable evaluations that remain robust under paraphrastic, semantic, and stylistic perturbations.
arXiv Detail & Related papers (2026-01-23T11:59:13Z)
Towards Reliable Medical LLMs: Benchmarking and Enhancing Confidence Estimation of Large Language Models in Medical Consultation [97.36081721024728]
We propose the first benchmark for assessing confidence in multi-turn interaction during realistic medical consultations.<n>Our benchmark unifies three types of medical data for open-ended diagnostic generation.<n>We present MedConf, an evidence-grounded linguistic self-assessment framework.
arXiv Detail & Related papers (2026-01-22T04:51:39Z)
MedAlign: A Synergistic Framework of Multimodal Preference Optimization and Federated Meta-Cognitive Reasoning [52.064286116035134]
We develop MedAlign, a framework to ensure visually accurate LVLM responses for Medical Visual Question Answering (Med-VQA)<n>We first propose a multimodal Direct Preference Optimization (mDPO) objective to align preference learning with visual context.<n>We then design a Retrieval-Aware Mixture-of-Experts (RA-MoE) architecture that utilizes image and text similarity to route queries to a specialized and context-augmented LVLM.
arXiv Detail & Related papers (2025-10-24T02:11:05Z)
MedAgentAudit: Diagnosing and Quantifying Collaborative Failure Modes in Medical Multi-Agent Systems [28.028343705313805]
Large language model (LLM)-based multi-agent systems show promise in simulating medical consultations.<n>But their evaluation is often confined to final-answer accuracy.<n>This practice treats their internal collaborative processes as opaque "black boxes"
arXiv Detail & Related papers (2025-10-11T11:48:57Z)
Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning Benchmarks [21.203358914772465]
Recent advances in vision-language models (VLMs) have achieved remarkable performance on standard medical benchmarks, yet their true clinical reasoning ability remains unclear.<n>We introduce Neural-MedBench, a compact yet reasoning-intensive benchmark designed to probe the limits of multimodal clinical reasoning in neurology.
arXiv Detail & Related papers (2025-09-26T12:20:01Z)
Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models [52.2001050216955]
Existing methods aim to enhance the performance of Medical Vision Language Model (MedVLM) by adjusting model structure, fine-tuning with high-quality data, or through preference fine-tuning.<n>We propose an expert-in-the-loop framework named Expert-Controlled-Free Guidance (Expert-CFG) to align MedVLM with clinical expertise without additional training.
arXiv Detail & Related papers (2025-07-12T09:03:30Z)
Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making [80.94208848596215]
We present a new concept called Catfish Agent, a role-specialized LLM designed to inject structured dissent and counter silent agreement.<n>Inspired by the catfish effect'' in organizational psychology, the Catfish Agent is designed to challenge emerging consensus to stimulate deeper reasoning.
arXiv Detail & Related papers (2025-05-27T17:59:50Z)
MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow [14.478357882578234]
In modern medicine, clinical diagnosis relies on the comprehensive analysis of primarily textual and visual data.<n>Recent advances in large Vision-Language Models (VLMs) and agent-based methods hold great potential for medical diagnosis.<n>We propose MedAgent-Pro, a new agentic reasoning paradigm that follows the diagnosis principle in modern medicine.
arXiv Detail & Related papers (2025-03-21T14:04:18Z)
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning [29.84956540178252]
Reasoning is a critical frontier for advancing medical image analysis.<n>We introduce MedVLM-R1, a medical VLM that explicitly generates natural language reasoning.<n>MedVLM-R1 boosts accuracy from 55.11% to 78.22% across MRI, CT, and X-ray benchmarks.
arXiv Detail & Related papers (2025-02-26T23:57:34Z)
MedCoT: Medical Chain of Thought via Hierarchical Expert [48.91966620985221]
This paper presents MedCoT, a novel hierarchical expert verification reasoning chain method.<n>It is designed to enhance interpretability and accuracy in biomedical imaging inquiries.<n> Experimental evaluations on four standard Med-VQA datasets demonstrate that MedCoT surpasses existing state-of-the-art approaches.
arXiv Detail & Related papers (2024-12-18T11:14:02Z)
Reliable Multimodality Eye Disease Screening via Mixture of Student's t Distributions [49.4545260500952]
We introduce a novel multimodality evidential fusion pipeline for eye disease screening, EyeMoSt. Our model estimates both local uncertainty for unimodality and global uncertainty for the fusion modality to produce reliable classification results. Our experimental findings on both public and in-house datasets show that our model is more reliable than current methods.
arXiv Detail & Related papers (2023-03-17T06:18:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.