Related papers: VinDr-CXR-VQA: A Visual Question Answering Dataset for Explainable Chest X-Ray Analysis with Multi-Task Learning

VinDr-CXR-VQA: A Visual Question Answering Dataset for Explainable Chest X-Ray Analysis with Multi-Task Learning

URL: http://arxiv.org/abs/2511.00504v2
Date: Sun, 09 Nov 2025 09:03:27 GMT
Title: VinDr-CXR-VQA: A Visual Question Answering Dataset for Explainable Chest X-Ray Analysis with Multi-Task Learning
Authors: Dang H. Nguyen, Hieu H. Pham, Hao T. Nguyen, Hieu H. Pham,
Abstract summary: VinDr-CXR-VQA is a large-scale chest X-ray dataset for explainable Medical Visual Question Answering (Med-VQA) with spatial grounding.<n>The dataset contains 17,597 question-answer pairs across 4,394 images, each annotated with radiologist-verified bounding boxes and clinical reasoning explanations.
Score: 3.4998703934432682
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present VinDr-CXR-VQA, a large-scale chest X-ray dataset for explainable Medical Visual Question Answering (Med-VQA) with spatial grounding. The dataset contains 17,597 question-answer pairs across 4,394 images, each annotated with radiologist-verified bounding boxes and clinical reasoning explanations. Our question taxonomy spans six diagnostic types-Where, What, Is there, How many, Which, and Yes/No-capturing diverse clinical intents. To improve reliability, we construct a balanced distribution of 41.7% positive and 58.3% negative samples, mitigating hallucinations in normal cases. Benchmarking with MedGemma-4B-it demonstrates improved performance (F1 = 0.624, +11.8% over baseline) while enabling lesion localization. VinDr-CXR-VQA aims to advance reproducible and clinically grounded Med-VQA research. The dataset and evaluation tools are publicly available at huggingface.co/datasets/Dangindev/VinDR-CXR-VQA.

Related papers

Evaluating Reasoning Faithfulness in Medical Vision-Language Models using Multimodal Perturbations [19.488236277427358]
Vision-language models (VLMs) often produce chain-of-thought (CoT) explanations that sound plausible yet fail to reflect the underlying decision process.<n>We present a clinically grounded framework for chest X-ray visual question answering (VQA) that probes CoT faithfulness via controlled text and image modifications.
arXiv Detail & Related papers (2025-10-13T09:28:22Z)
Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy [3.3091869879941687]
We introduce Kvasir-VQA-x1, a new, large-scale dataset for gastrointestinal (GI) endoscopy.<n>Our work significantly expands upon the original Kvasir-VQA by incorporating 159,549 new question-answer pairs.<n>By providing a more challenging and clinically relevant benchmark, Kvasir-VQA-x1 aims to accelerate the development of more reliable and effective multimodal AI systems.
arXiv Detail & Related papers (2025-06-11T17:31:38Z)
Interpreting Chest X-rays Like a Radiologist: A Benchmark with Clinical Reasoning [18.15610003617933]
We present CXRTrek, a new multi-stage visual question answering (VQA) dataset for chest X-ray (CXR) interpretation.<n>The dataset is designed to explicitly simulate the diagnostic reasoning process employed by radiologists in real-world clinical settings.<n>We propose a new vision-language large model (VLLM), CXRTrekNet, specifically designed to incorporate the clinical reasoning flow into the framework.
arXiv Detail & Related papers (2025-05-29T06:30:40Z)
GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis [44.76975131560712]
We introduce a large-scale, Groundable, and Explainable Medical VQA benchmark for chest X-ray diagnosis (GEMeX)<n>With 151,025 images and 1,605,575 questions, GEMeX is the currently largest chest X-ray VQA dataset.
arXiv Detail & Related papers (2024-11-25T07:36:46Z)
Towards a Benchmark for Colorectal Cancer Segmentation in Endorectal Ultrasound Videos: Dataset and Model Development [59.74920439478643]
In this paper, we collect and annotated the first benchmark dataset that covers diverse ERUS scenarios. Our ERUS-10K dataset comprises 77 videos and 10,000 high-resolution annotated frames. We introduce a benchmark model for colorectal cancer segmentation, named the Adaptive Sparse-context TRansformer (ASTR)
arXiv Detail & Related papers (2024-08-19T15:04:42Z)
Instrumental Variable Learning for Chest X-ray Classification [52.68170685918908]
We propose an interpretable instrumental variable (IV) learning framework to eliminate the spurious association and obtain accurate causal representation. Our approach's performance is demonstrated using the MIMIC-CXR, NIH ChestX-ray 14, and CheXpert datasets.
arXiv Detail & Related papers (2023-05-20T03:12:23Z)
CIRCA: comprehensible online system in support of chest X-rays-based COVID-19 diagnosis [37.41181188499616]
Deep learning techniques can help in the faster detection of COVID-19 cases and monitoring of disease progression. Five different datasets were used to construct a representative dataset of 23 799 CXRs for model training. A U-Net-based model was developed to identify a clinically relevant region of the CXR.
arXiv Detail & Related papers (2022-10-11T13:30:34Z)
Data-Efficient Vision Transformers for Multi-Label Disease Classification on Chest Radiographs [55.78588835407174]
Vision Transformers (ViTs) have not been applied to this task despite their high classification performance on generic images. ViTs do not rely on convolutions but on patch-based self-attention and in contrast to CNNs, no prior knowledge of local connectivity is present. Our results show that while the performance between ViTs and CNNs is on par with a small benefit for ViTs, DeiTs outperform the former if a reasonably large data set is available for training.
arXiv Detail & Related papers (2022-08-17T09:07:45Z)
PediCXR: An open, large-scale chest radiograph dataset for interpretation of common thoracic diseases in children [0.31317409221921133]
We release PediCXR, a new pediatric CXR dataset of 9,125 studies retrospectively collected from a major pediatric hospital in Vietnam between 2020 and 2021. The dataset was labeled for the presence of 36 critical findings and 15 diseases.
arXiv Detail & Related papers (2022-03-20T18:03:11Z)
The pitfalls of using open data to develop deep learning solutions for COVID-19 detection in chest X-rays [64.02097860085202]
Deep learning models have been developed to identify COVID-19 from chest X-rays. Results have been exceptional when training and testing on open-source data. Data analysis and model evaluations show that the popular open-source dataset COVIDx is not representative of the real clinical problem.
arXiv Detail & Related papers (2021-09-14T10:59:11Z)
Chest x-ray automated triage: a semiologic approach designed for clinical implementation, exploiting different types of labels through a combination of four Deep Learning architectures [83.48996461770017]
This work presents a Deep Learning method based on the late fusion of different convolutional architectures. We built four training datasets combining images from public chest x-ray datasets and our institutional archive. We trained four different Deep Learning architectures and combined their outputs with a late fusion strategy, obtaining a unified tool.
arXiv Detail & Related papers (2020-12-23T14:38:35Z)
Predicting COVID-19 Pneumonia Severity on Chest X-ray with Deep Learning [57.00601760750389]
We present a severity score prediction model for COVID-19 pneumonia for frontal chest X-ray images. Such a tool can gauge severity of COVID-19 lung infections that can be used for escalation or de-escalation of care.
arXiv Detail & Related papers (2020-05-24T23:13:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.