VinDr-CXR-VQA: A Visual Question Answering Dataset for Explainable Chest X-Ray Analysis with Multi-Task Learning
- URL: http://arxiv.org/abs/2511.00504v2
- Date: Sun, 09 Nov 2025 09:03:27 GMT
- Title: VinDr-CXR-VQA: A Visual Question Answering Dataset for Explainable Chest X-Ray Analysis with Multi-Task Learning
- Authors: Dang H. Nguyen, Hieu H. Pham, Hao T. Nguyen, Hieu H. Pham,
- Abstract summary: VinDr-CXR-VQA is a large-scale chest X-ray dataset for explainable Medical Visual Question Answering (Med-VQA) with spatial grounding.<n>The dataset contains 17,597 question-answer pairs across 4,394 images, each annotated with radiologist-verified bounding boxes and clinical reasoning explanations.
- Score: 3.4998703934432682
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present VinDr-CXR-VQA, a large-scale chest X-ray dataset for explainable Medical Visual Question Answering (Med-VQA) with spatial grounding. The dataset contains 17,597 question-answer pairs across 4,394 images, each annotated with radiologist-verified bounding boxes and clinical reasoning explanations. Our question taxonomy spans six diagnostic types-Where, What, Is there, How many, Which, and Yes/No-capturing diverse clinical intents. To improve reliability, we construct a balanced distribution of 41.7% positive and 58.3% negative samples, mitigating hallucinations in normal cases. Benchmarking with MedGemma-4B-it demonstrates improved performance (F1 = 0.624, +11.8% over baseline) while enabling lesion localization. VinDr-CXR-VQA aims to advance reproducible and clinically grounded Med-VQA research. The dataset and evaluation tools are publicly available at huggingface.co/datasets/Dangindev/VinDR-CXR-VQA.
Related papers
- Evaluating Reasoning Faithfulness in Medical Vision-Language Models using Multimodal Perturbations [19.488236277427358]
Vision-language models (VLMs) often produce chain-of-thought (CoT) explanations that sound plausible yet fail to reflect the underlying decision process.<n>We present a clinically grounded framework for chest X-ray visual question answering (VQA) that probes CoT faithfulness via controlled text and image modifications.
arXiv Detail & Related papers (2025-10-13T09:28:22Z) - Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy [3.3091869879941687]
We introduce Kvasir-VQA-x1, a new, large-scale dataset for gastrointestinal (GI) endoscopy.<n>Our work significantly expands upon the original Kvasir-VQA by incorporating 159,549 new question-answer pairs.<n>By providing a more challenging and clinically relevant benchmark, Kvasir-VQA-x1 aims to accelerate the development of more reliable and effective multimodal AI systems.
arXiv Detail & Related papers (2025-06-11T17:31:38Z) - Interpreting Chest X-rays Like a Radiologist: A Benchmark with Clinical Reasoning [18.15610003617933]
We present CXRTrek, a new multi-stage visual question answering (VQA) dataset for chest X-ray (CXR) interpretation.<n>The dataset is designed to explicitly simulate the diagnostic reasoning process employed by radiologists in real-world clinical settings.<n>We propose a new vision-language large model (VLLM), CXRTrekNet, specifically designed to incorporate the clinical reasoning flow into the framework.
arXiv Detail & Related papers (2025-05-29T06:30:40Z) - GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis [44.76975131560712]
We introduce a large-scale, Groundable, and Explainable Medical VQA benchmark for chest X-ray diagnosis (GEMeX)<n>With 151,025 images and 1,605,575 questions, GEMeX is the currently largest chest X-ray VQA dataset.
arXiv Detail & Related papers (2024-11-25T07:36:46Z) - Towards a Benchmark for Colorectal Cancer Segmentation in Endorectal Ultrasound Videos: Dataset and Model Development [59.74920439478643]
In this paper, we collect and annotated the first benchmark dataset that covers diverse ERUS scenarios.
Our ERUS-10K dataset comprises 77 videos and 10,000 high-resolution annotated frames.
We introduce a benchmark model for colorectal cancer segmentation, named the Adaptive Sparse-context TRansformer (ASTR)
arXiv Detail & Related papers (2024-08-19T15:04:42Z) - Instrumental Variable Learning for Chest X-ray Classification [52.68170685918908]
We propose an interpretable instrumental variable (IV) learning framework to eliminate the spurious association and obtain accurate causal representation.
Our approach's performance is demonstrated using the MIMIC-CXR, NIH ChestX-ray 14, and CheXpert datasets.
arXiv Detail & Related papers (2023-05-20T03:12:23Z) - CIRCA: comprehensible online system in support of chest X-rays-based
COVID-19 diagnosis [37.41181188499616]
Deep learning techniques can help in the faster detection of COVID-19 cases and monitoring of disease progression.
Five different datasets were used to construct a representative dataset of 23 799 CXRs for model training.
A U-Net-based model was developed to identify a clinically relevant region of the CXR.
arXiv Detail & Related papers (2022-10-11T13:30:34Z) - Data-Efficient Vision Transformers for Multi-Label Disease
Classification on Chest Radiographs [55.78588835407174]
Vision Transformers (ViTs) have not been applied to this task despite their high classification performance on generic images.
ViTs do not rely on convolutions but on patch-based self-attention and in contrast to CNNs, no prior knowledge of local connectivity is present.
Our results show that while the performance between ViTs and CNNs is on par with a small benefit for ViTs, DeiTs outperform the former if a reasonably large data set is available for training.
arXiv Detail & Related papers (2022-08-17T09:07:45Z) - PediCXR: An open, large-scale chest radiograph dataset for
interpretation of common thoracic diseases in children [0.31317409221921133]
We release PediCXR, a new pediatric CXR dataset of 9,125 studies retrospectively collected from a major pediatric hospital in Vietnam between 2020 and 2021.
The dataset was labeled for the presence of 36 critical findings and 15 diseases.
arXiv Detail & Related papers (2022-03-20T18:03:11Z) - The pitfalls of using open data to develop deep learning solutions for
COVID-19 detection in chest X-rays [64.02097860085202]
Deep learning models have been developed to identify COVID-19 from chest X-rays.
Results have been exceptional when training and testing on open-source data.
Data analysis and model evaluations show that the popular open-source dataset COVIDx is not representative of the real clinical problem.
arXiv Detail & Related papers (2021-09-14T10:59:11Z) - Chest x-ray automated triage: a semiologic approach designed for
clinical implementation, exploiting different types of labels through a
combination of four Deep Learning architectures [83.48996461770017]
This work presents a Deep Learning method based on the late fusion of different convolutional architectures.
We built four training datasets combining images from public chest x-ray datasets and our institutional archive.
We trained four different Deep Learning architectures and combined their outputs with a late fusion strategy, obtaining a unified tool.
arXiv Detail & Related papers (2020-12-23T14:38:35Z) - Predicting COVID-19 Pneumonia Severity on Chest X-ray with Deep Learning [57.00601760750389]
We present a severity score prediction model for COVID-19 pneumonia for frontal chest X-ray images.
Such a tool can gauge severity of COVID-19 lung infections that can be used for escalation or de-escalation of care.
arXiv Detail & Related papers (2020-05-24T23:13:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.