Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models
- URL: http://arxiv.org/abs/2507.09209v1
- Date: Sat, 12 Jul 2025 09:03:30 GMT
- Title: Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models
- Authors: Xiao Liang, Di Wang, Zhicheng Jiao, Ronghan Li, Pengfei Yang, Quan Wang, Tat-Seng Chua,
- Abstract summary: Existing methods aim to enhance the performance of Medical Vision Language Model (MedVLM) by adjusting model structure, fine-tuning with high-quality data, or through preference fine-tuning.<n>We propose an expert-in-the-loop framework named Expert-Controlled-Free Guidance (Expert-CFG) to align MedVLM with clinical expertise without additional training.
- Score: 52.2001050216955
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid advancements in Vision Language Models (VLMs) have prompted the development of multi-modal medical assistant systems. Despite this progress, current models still have inherent probabilistic uncertainties, often producing erroneous or unverified responses-an issue with serious implications in medical applications. Existing methods aim to enhance the performance of Medical Vision Language Model (MedVLM) by adjusting model structure, fine-tuning with high-quality data, or through preference fine-tuning. However, these training-dependent strategies are costly and still lack sufficient alignment with clinical expertise. To address these issues, we propose an expert-in-the-loop framework named Expert-Controlled Classifier-Free Guidance (Expert-CFG) to align MedVLM with clinical expertise without additional training. This framework introduces an uncertainty estimation strategy to identify unreliable outputs. It then retrieves relevant references to assist experts in highlighting key terms and applies classifier-free guidance to refine the token embeddings of MedVLM, ensuring that the adjusted outputs are correct and align with expert highlights. Evaluations across three medical visual question answering benchmarks demonstrate that the proposed Expert-CFG, with 4.2B parameters and limited expert annotations, outperforms state-of-the-art models with 13B parameters. The results demonstrate the feasibility of deploying such a system in resource-limited settings for clinical use.
Related papers
- Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications [59.721265428780946]
Large Language Models (LLMs) in medicine have enabled impressive capabilities, yet a critical gap remains in their ability to perform systematic, transparent, and verifiable reasoning.<n>This paper provides the first systematic review of this emerging field.<n>We propose a taxonomy of reasoning enhancement techniques, categorized into training-time strategies and test-time mechanisms.
arXiv Detail & Related papers (2025-08-01T14:41:31Z) - Towards Interpretable Renal Health Decline Forecasting via Multi-LMM Collaborative Reasoning Framework [12.732588046754783]
We propose a collaborative framework that enhances the performance of open-source LMMs for eGFR forecasting.<n>It incorporates visual knowledge transfer, abductive reasoning, and a short-term memory mechanism to enhance prediction accuracy and interpretability.<n>Our method sheds new light on building AI systems for healthcare that combine predictive accuracy with clinically grounded interpretability.
arXiv Detail & Related papers (2025-07-30T08:11:06Z) - KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs [39.47350988195002]
Large language models (LLMs) have shown promise in leveraging language abilities and biomedical knowledge for diagnosis prediction.<n>We propose KERAP, a knowledge graph (KG)-enhanced reasoning approach that improves LLM-based diagnosis prediction through a multi-agent architecture.<n>Our framework consists of a linkage agent for mapping, a retrieval agent for structured knowledge extraction, and a prediction agent that iteratively refines diagnosis predictions.
arXiv Detail & Related papers (2025-07-03T16:35:11Z) - Prompt Mechanisms in Medical Imaging: A Comprehensive Survey [18.072753363565322]
Deep learning offers transformative potential in medical imaging.<n>Yet its clinical adoption is frequently hampered by challenges such as data scarcity, distribution shifts, and the need for robust task generalization.<n>Prompt-based methodologies have emerged as a pivotal strategy to guide deep learning models.
arXiv Detail & Related papers (2025-06-28T03:06:25Z) - Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions.<n>We propose a novel approach utilizing structured medical reasoning.<n>Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z) - Hierarchical Divide-and-Conquer for Fine-Grained Alignment in LLM-Based Medical Evaluation [31.061600616994145]
HDCEval is built on a set of fine-grained medical evaluation guidelines developed in collaboration with professional doctors.<n>The framework decomposes complex evaluation tasks into specialized subtasks, each evaluated by expert models.<n>This hierarchical approach ensures that each aspect of the evaluation is handled with expert precision, leading to a significant improvement in alignment with human evaluators.
arXiv Detail & Related papers (2025-01-12T07:30:49Z) - Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering [70.44269982045415]
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs)
We introduce Medical Retrieval-Augmented Generation Benchmark (MedRGB) that provides various supplementary elements to four medical QA datasets.
Our experimental results reveals current models' limited ability to handle noise and misinformation in the retrieved documents.
arXiv Detail & Related papers (2024-11-14T06:19:18Z) - Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval [61.70489848327436]
KARE is a novel framework that integrates knowledge graph (KG) community-level retrieval with large language models (LLMs) reasoning.<n>Extensive experiments demonstrate that KARE outperforms leading models by up to 10.8-15.0% on MIMIC-III and 12.6-12.7% on MIMIC-IV for mortality and readmission predictions.
arXiv Detail & Related papers (2024-10-06T18:46:28Z) - Beyond One-Time Validation: A Framework for Adaptive Validation of Prognostic and Diagnostic AI-based Medical Devices [55.319842359034546]
Existing approaches often fall short in addressing the complexity of practically deploying these devices.
The presented framework emphasizes the importance of repeating validation and fine-tuning during deployment.
It is positioned within the current US and EU regulatory landscapes.
arXiv Detail & Related papers (2024-09-07T11:13:52Z) - Natural Language Programming in Medicine: Administering Evidence Based Clinical Workflows with Autonomous Agents Powered by Generative Large Language Models [29.05425041393475]
Generative Large Language Models (LLMs) hold significant promise in healthcare.
This study assessed the potential of LLMs to function as autonomous agents in a simulated tertiary care medical center.
arXiv Detail & Related papers (2024-01-05T15:09:57Z) - Distribution-Free Federated Learning with Conformal Predictions [0.0]
Federated learning aims to leverage separate institutional datasets while maintaining patient privacy.
Poor calibration and lack of interpretability may hamper widespread deployment of federated models into clinical practice.
We propose to address these challenges by incorporating an adaptive conformal framework into federated learning.
arXiv Detail & Related papers (2021-10-14T18:41:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.