Related papers: MedCFVQA: A Causal Approach to Mitigate Modality Preference Bias in Medical Visual Question Answering

MedCFVQA: A Causal Approach to Mitigate Modality Preference Bias in Medical Visual Question Answering

URL: http://arxiv.org/abs/2505.16209v2
Date: Fri, 23 May 2025 01:20:08 GMT
Title: MedCFVQA: A Causal Approach to Mitigate Modality Preference Bias in Medical Visual Question Answering
Authors: Shuchang Ye, Usman Naseem, Mingyuan Meng, Dagan Feng, Jinman Kim,
Abstract summary: Existing MedVQA models suffered from modality preference bias, where predictions are heavily dominated by one modality while overlooking the other.<n>We propose a Medical CounterFactual VQA (MedCFVQA) model, which trains with bias and leverages causal graphs to eliminate the modality preference bias during inference.<n>We show that MedCFVQA significantly outperforms its non-causal counterpart on both SLAKE, RadVQA and SLAKE-CP, RadVQA-CP datasets.
Score: 13.506155313741493
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Medical Visual Question Answering (MedVQA) is crucial for enhancing the efficiency of clinical diagnosis by providing accurate and timely responses to clinicians' inquiries regarding medical images. Existing MedVQA models suffered from modality preference bias, where predictions are heavily dominated by one modality while overlooking the other (in MedVQA, usually questions dominate the answer but images are overlooked), thereby failing to learn multimodal knowledge. To overcome the modality preference bias, we proposed a Medical CounterFactual VQA (MedCFVQA) model, which trains with bias and leverages causal graphs to eliminate the modality preference bias during inference. Existing MedVQA datasets exhibit substantial prior dependencies between questions and answers, which results in acceptable performance even if the model significantly suffers from the modality preference bias. To address this issue, we reconstructed new datasets by leveraging existing MedVQA datasets and Changed their P3rior dependencies (CP) between questions and their answers in the training and test set. Extensive experiments demonstrate that MedCFVQA significantly outperforms its non-causal counterpart on both SLAKE, RadVQA and SLAKE-CP, RadVQA-CP datasets.

Related papers

GEMeX-ThinkVG: Towards Thinking with Visual Grounding in Medical VQA via Reinforcement Learning [50.94508930739623]
Medical visual question answering aims to support clinical decision-making by enabling models to answer natural language questions based on medical images.<n>Current methods still suffer from limited answer reliability and poor interpretability, impairing the ability of clinicians and patients to understand and trust model-generated answers.<n>This work first proposes a Thinking with Visual Grounding dataset wherein the answer generation is decomposed into intermediate reasoning steps.<n>We introduce a novel verifiable reward mechanism for reinforcement learning to guide post-training, improving the alignment between the model's reasoning process and its final answer.
arXiv Detail & Related papers (2025-06-22T08:09:58Z)
Structure Causal Models and LLMs Integration in Medical Visual Question Answering [42.54219413108453]
We propose a causal inference framework for the MedVQA task.<n>We are the first to introduce a novel causal graph structure that represents the interaction between visual and textual elements.<n>Our method achieves true causal correlations in the face of complex medical data.
arXiv Detail & Related papers (2025-05-05T14:57:02Z)
UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA. We first augment the existing data via deliberate perturbations on either the image or question. We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z)
Visual Question Answering in the Medical Domain [13.673890873313354]
We present a novel contrastive learning pretraining method to mitigate the problem of small datasets for the Med-VQA task. Our proposed model obtained an accuracy of 60% on the VQA-Med 2019 test set, giving comparable results to other state-of-the-art Med-VQA models.
arXiv Detail & Related papers (2023-09-20T06:06:10Z)
Improving Selective Visual Question Answering by Learning from Your Peers [74.20167944693424]
Visual Question Answering (VQA) models can have difficulties abstaining from answering when they are wrong. We propose Learning from Your Peers (LYP) approach for training multimodal selection functions for making abstention decisions. Our approach uses predictions from models trained on distinct subsets of the training data as targets for optimizing a Selective VQA model.
arXiv Detail & Related papers (2023-06-14T21:22:01Z)
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering [56.25766322554655]
Medical Visual Question Answering (MedVQA) presents a significant opportunity to enhance diagnostic accuracy and healthcare delivery. We propose a generative-based model for medical visual understanding by aligning visual information from a pre-trained vision encoder with a large language model. We train the proposed model on PMC-VQA and then fine-tune it on multiple public benchmarks, e.g., VQA-RAD, SLAKE, and Image-Clef 2019.
arXiv Detail & Related papers (2023-05-17T17:50:16Z)
Medical Question Summarization with Entity-driven Contrastive Learning [12.008269098530386]
This paper proposes a novel medical question summarization framework using entity-driven contrastive learning (ECL) ECL employs medical entities in frequently asked questions (FAQs) as focuses and devises an effective mechanism to generate hard negative samples. We find that some MQA datasets suffer from serious data leakage problems, such as the iCliniq dataset's 33% duplicate rate.
arXiv Detail & Related papers (2023-04-15T00:19:03Z)
Consistency-preserving Visual Question Answering in Medical Imaging [2.005299372367689]
Visual Question Answering (VQA) models take an image and a natural-language question as input and infer the answer to the question. We propose a novel loss function and corresponding training procedure that allows the inclusion of relations between questions into the training process. Our experiments show that our method outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2022-06-27T13:38:50Z)
Greedy Gradient Ensemble for Robust Visual Question Answering [163.65789778416172]
We stress the language bias in Visual Question Answering (VQA) that comes from two aspects, i.e., distribution bias and shortcut bias. We propose a new de-bias framework, Greedy Gradient Ensemble (GGE), which combines multiple biased models for unbiased base model learning. GGE forces the biased models to over-fit the biased data distribution in priority, thus makes the base model pay more attention to examples that are hard to solve by biased models.
arXiv Detail & Related papers (2021-07-27T08:02:49Z)
Human-Adversarial Visual Question Answering [62.30715496829321]
We benchmark state-of-the-art VQA models against human-adversarial examples. We find that a wide range of state-of-the-art models perform poorly when evaluated on these examples.
arXiv Detail & Related papers (2021-06-04T06:25:32Z)
Hemogram Data as a Tool for Decision-making in COVID-19 Management: Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure. This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients. Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)
Towards Causality-Aware Inferring: A Sequential Discriminative Approach for Medical Diagnosis [142.90770786804507]
Medical diagnosis assistant (MDA) aims to build an interactive diagnostic agent to sequentially inquire about symptoms for discriminating diseases. This work attempts to address these critical issues in MDA by taking advantage of the causal diagram. We propose a propensity-based patient simulator to effectively answer unrecorded inquiry by drawing knowledge from the other records.
arXiv Detail & Related papers (2020-03-14T02:05:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.