Related papers: MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale

MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale

URL: http://arxiv.org/abs/2404.12372v1
Date: Thu, 18 Apr 2024 17:53:19 GMT
Title: MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale
Authors: Xiaotang Gai, Chenyi Zhou, Jiaxiang Liu, Yang Feng, Jian Wu, Zuozhu Liu,
Abstract summary: We devise a semi-automated annotation process to streamline data preparation and build new benchmark MedVQA datasets. The R-RAD and R-SLAKE datasets provide intermediate medical decision-making rationales. Our method can achieve an accuracy of 83.5% on R-RAD and 86.3% on R-SLAKE, significantly outperforming existing state-of-the-art baselines.
Score: 19.94415334436024
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Medical Visual Question Answering (MedVQA), which offers language responses to image-based medical inquiries, represents a challenging task and significant advancement in healthcare. It assists medical experts to swiftly interpret medical images, thereby enabling faster and more accurate diagnoses. However, the model interpretability and transparency of existing MedVQA solutions are often limited, posing challenges in understanding their decision-making processes. To address this issue, we devise a semi-automated annotation process to streamlining data preparation and build new benchmark MedVQA datasets R-RAD and R-SLAKE. The R-RAD and R-SLAKE datasets provide intermediate medical decision-making rationales generated by multimodal large language models and human annotations for question-answering pairs in existing MedVQA datasets, i.e., VQA-RAD and SLAKE. Moreover, we design a novel framework which finetunes lightweight pretrained generative models by incorporating medical decision-making rationales into the training process. The framework includes three distinct strategies to generate decision outcomes and corresponding rationales, thereby clearly showcasing the medical decision-making process during reasoning. Extensive experiments demonstrate that our method can achieve an accuracy of 83.5% on R-RAD and 86.3% on R-SLAKE, significantly outperforming existing state-of-the-art baselines. Dataset and code will be released.

Related papers

GEMeX-ThinkVG: Towards Thinking with Visual Grounding in Medical VQA via Reinforcement Learning [50.94508930739623]
Medical visual question answering aims to support clinical decision-making by enabling models to answer natural language questions based on medical images.<n>Current methods still suffer from limited answer reliability and poor interpretability, impairing the ability of clinicians and patients to understand and trust model-generated answers.<n>This work first proposes a Thinking with Visual Grounding dataset wherein the answer generation is decomposed into intermediate reasoning steps.<n>We introduce a novel verifiable reward mechanism for reinforcement learning to guide post-training, improving the alignment between the model's reasoning process and its final answer.
arXiv Detail & Related papers (2025-06-22T08:09:58Z)
Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards [21.831262938278915]
We introduce Med-PRM, a process reward modeling framework to verify each reasoning step against established medical knowledge bases.<n>Med-PRM achieves state-of-the-art performance, with improving the performance of base models by up to 13.50%.<n>We demonstrate the generality of Med-PRM by integrating it in a plug-and-play fashion with strong policy models such as Meerkat.
arXiv Detail & Related papers (2025-06-13T05:36:30Z)
Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions. We propose a novel approach utilizing structured medical reasoning. Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z)
Uncertainty-aware abstention in medical diagnosis based on medical texts [87.88110503208016]
This study addresses the critical issue of reliability for AI-assisted medical diagnosis. We focus on the selection prediction approach that allows the diagnosis system to abstain from providing the decision if it is not confident in the diagnosis. We introduce HUQ-2, a new state-of-the-art method for enhancing reliability in selective prediction tasks.
arXiv Detail & Related papers (2025-02-25T10:15:21Z)
LLM-MedQA: Enhancing Medical Question Answering through Case Studies in Large Language Models [18.6994780408699]
Large Language Models (LLMs) face significant challenges in medical question answering. We propose a novel approach incorporating similar case generation within a multi-agent medical question-answering system. Our method capitalizes on the model's inherent medical knowledge and reasoning capabilities, eliminating the need for additional training data.
arXiv Detail & Related papers (2024-12-31T19:55:45Z)
MedCoT: Medical Chain of Thought via Hierarchical Expert [48.91966620985221]
This paper presents MedCoT, a novel hierarchical expert verification reasoning chain method. It is designed to enhance interpretability and accuracy in biomedical imaging inquiries. Experimental evaluations on four standard Med-VQA datasets demonstrate that MedCoT surpasses existing state-of-the-art approaches.
arXiv Detail & Related papers (2024-12-18T11:14:02Z)
Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering [70.44269982045415]
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs) We introduce Medical Retrieval-Augmented Generation Benchmark (MedRGB) that provides various supplementary elements to four medical QA datasets. Our experimental results reveals current models' limited ability to handle noise and misinformation in the retrieved documents.
arXiv Detail & Related papers (2024-11-14T06:19:18Z)
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models [49.765466293296186]
Recent progress in Medical Large Vision-Language Models (Med-LVLMs) has opened up new possibilities for interactive diagnostic tools. Med-LVLMs often suffer from factual hallucination, which can lead to incorrect diagnoses. We propose a versatile multimodal RAG system, MMed-RAG, designed to enhance the factuality of Med-LVLMs.
arXiv Detail & Related papers (2024-10-16T23:03:27Z)
Free Form Medical Visual Question Answering in Radiology [3.495246564946556]
Research in medical Visual Question Answering has been scant, only gaining momentum since 2018. Our research delves into the effective representation of radiology images and the joint learning of multimodal representations. Our model achieves a top-1 accuracy of 79.55% with a less complex architecture, demonstrating comparable performance to current state-of-the-art models.
arXiv Detail & Related papers (2024-01-23T20:26:52Z)
MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English Clinical Queries [16.101969130235055]
We introduce the Multimodal Medical Codemixed Question Summarization MMCQS dataset. This dataset combines Hindi-English codemixed medical queries with visual aids. Our dataset, code, and pre-trained models will be made publicly available.
arXiv Detail & Related papers (2024-01-03T07:58:25Z)
Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models [49.95603725998561]
We propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts. Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-language model.
arXiv Detail & Related papers (2023-10-04T21:57:09Z)
Visual Question Answering in the Medical Domain [13.673890873313354]
We present a novel contrastive learning pretraining method to mitigate the problem of small datasets for the Med-VQA task. Our proposed model obtained an accuracy of 60% on the VQA-Med 2019 test set, giving comparable results to other state-of-the-art Med-VQA models.
arXiv Detail & Related papers (2023-09-20T06:06:10Z)
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering [56.25766322554655]
Medical Visual Question Answering (MedVQA) presents a significant opportunity to enhance diagnostic accuracy and healthcare delivery. We propose a generative-based model for medical visual understanding by aligning visual information from a pre-trained vision encoder with a large language model. We train the proposed model on PMC-VQA and then fine-tune it on multiple public benchmarks, e.g., VQA-RAD, SLAKE, and Image-Clef 2019.
arXiv Detail & Related papers (2023-05-17T17:50:16Z)
Large Language Models Encode Clinical Knowledge [21.630872464930587]
Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation. We propose a framework for human evaluation of model answers along multiple axes including factuality, precision, possible harm, and bias. We show that comprehension, recall of knowledge, and medical reasoning improve with model scale and instruction prompt tuning.
arXiv Detail & Related papers (2022-12-26T14:28:24Z)
"My nose is running.""Are you also coughing?": Building A Medical Diagnosis Agent with Interpretable Inquiry Logics [80.55587329326046]
We propose a more interpretable decision process to implement the dialogue manager of DSMD. We devise a model with highly transparent components to conduct the inference. Experiments show that our method obtains 7.7%, 10.0%, 3.0% absolute improvement in diagnosis accuracy.
arXiv Detail & Related papers (2022-04-29T09:02:23Z)
Semi-Supervised Variational Reasoning for Medical Dialogue Generation [70.838542865384]
Two key characteristics are relevant for medical dialogue generation: patient states and physician actions. We propose an end-to-end variational reasoning approach to medical dialogue generation. A physician policy network composed of an action-classifier and two reasoning detectors is proposed for augmented reasoning ability.
arXiv Detail & Related papers (2021-05-13T04:14:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.