Related papers: Medico 2025: Visual Question Answering for Gastrointestinal Imaging

Medico 2025: Visual Question Answering for Gastrointestinal Imaging

URL: http://arxiv.org/abs/2508.10869v1
Date: Thu, 14 Aug 2025 17:43:46 GMT
Title: Medico 2025: Visual Question Answering for Gastrointestinal Imaging
Authors: Sushant Gautam, Vajira Thambawita, Michael Riegler, Pål Halvorsen, Steven Hicks,
Abstract summary: The Medico 2025 challenge addresses Visual Question Answering (VQA) for Gastrointestinal (GI) imaging.<n>The challenge focuses on developing Explainable Artificial Intelligence (XAI) models that answer clinically relevant questions.
Score: 2.8271229358498595
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Medico 2025 challenge addresses Visual Question Answering (VQA) for Gastrointestinal (GI) imaging, organized as part of the MediaEval task series. The challenge focuses on developing Explainable Artificial Intelligence (XAI) models that answer clinically relevant questions based on GI endoscopy images while providing interpretable justifications aligned with medical reasoning. It introduces two subtasks: (1) answering diverse types of visual questions using the Kvasir-VQA-x1 dataset, and (2) generating multimodal explanations to support clinical decision-making. The Kvasir-VQA-x1 dataset, created from 6,500 images and 159,549 complex question-answer (QA) pairs, serves as the benchmark for the challenge. By combining quantitative performance metrics and expert-reviewed explainability assessments, this task aims to advance trustworthy Artificial Intelligence (AI) in medical image analysis. Instructions, data access, and an updated guide for participation are available in the official competition repository: https://github.com/simula/MediaEval-Medico-2025

Related papers

MedQARo: A Large-Scale Benchmark for Medical Question Answering in Romanian [50.767415194856135]
We introduce MedQARo, the first large-scale medical QA benchmark in Romanian.<n>We construct a high-quality and large-scale dataset comprising 102,646 QA pairs related to cancer patients.
arXiv Detail & Related papers (2025-08-22T13:48:37Z)
Querying GI Endoscopy Images: A VQA Approach [0.0]
VQA (Visual Question Answering) combines Natural Language Processing (NLP) with image understanding to answer questions about a given image.<n>This study is a submission for ImageCLEFmed-MEDVQA-GI 2025 subtask 1 that explores the adaptation of the Florence2 model to answer medical visual questions on GI endoscopy images.
arXiv Detail & Related papers (2025-07-25T13:03:46Z)
Multimodal AI for Gastrointestinal Diagnostics: Tackling VQA in MEDVQA-GI 2025 [0.0]
This paper describes our approach to Subtask 1 of the ImageCLEFmed MEDVQA 2025 Challenge.<n>We adopt the Florence model-a large-scale multimodal foundation model-as the backbone of our VQA pipeline.<n>Experiments on the KASVIR dataset show that fine-tuning Florence yields accurate responses on the official challenge metrics.
arXiv Detail & Related papers (2025-07-19T09:04:13Z)
Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering [45.058569118999436]
Given a pair of main and reference images, this task attempts to answer several questions on both diseases. We collect a new dataset, namely MIMIC-Diff-VQA, including 700,703 QA pairs from 164,324 pairs of main and reference images.
arXiv Detail & Related papers (2023-07-22T05:34:18Z)
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering [56.25766322554655]
Medical Visual Question Answering (MedVQA) presents a significant opportunity to enhance diagnostic accuracy and healthcare delivery. We propose a generative-based model for medical visual understanding by aligning visual information from a pre-trained vision encoder with a large language model. We train the proposed model on PMC-VQA and then fine-tune it on multiple public benchmarks, e.g., VQA-RAD, SLAKE, and Image-Clef 2019.
arXiv Detail & Related papers (2023-05-17T17:50:16Z)
Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning [45.746882253686856]
Medical visual question answering (VQA) aims to answer clinically relevant questions regarding input medical images. We first collected a comprehensive and large-scale medical VQA dataset, focusing on chest X-ray images. Based on this dataset, we also propose a novel baseline method by constructing three different relationship graphs.
arXiv Detail & Related papers (2023-02-19T17:46:16Z)
Self-supervised vision-language pretraining for Medical visual question answering [9.073820229958054]
We propose a self-supervised method that applies Masked image modeling, Masked language modeling, Image text matching and Image text alignment via contrastive learning (M2I2) for pretraining. The proposed method achieves state-of-the-art performance on all the three public medical VQA datasets.
arXiv Detail & Related papers (2022-11-24T13:31:56Z)
Towards the Use of Saliency Maps for Explaining Low-Quality Electrocardiograms to End Users [51.644376281196394]
When using medical images for diagnosis, it is important that the images are of high quality.<n>In telemedicine, a common problem is that the quality issue is only flagged once the patient has left the clinic, meaning they must return in order to have the exam redone.<n>This paper reports on the development of an AI system for flagging and explaining low-quality medical images in real-time.
arXiv Detail & Related papers (2022-07-06T14:53:26Z)
Medical Visual Question Answering: A Survey [55.53205317089564]
Medical Visual Question Answering(VQA) is a combination of medical artificial intelligence and popular VQA challenges. Given a medical image and a clinically relevant question in natural language, the medical VQA system is expected to predict a plausible and convincing answer.
arXiv Detail & Related papers (2021-11-19T05:55:15Z)
Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex Healthcare Question Answering [89.76059961309453]
HeadQA dataset contains multiple-choice questions authorized for the public healthcare specialization exam. These questions are the most challenging for current QA systems. We present a Multi-step reasoning with Knowledge extraction framework (MurKe) We are striving to make full use of off-the-shelf pre-trained models.
arXiv Detail & Related papers (2020-08-06T02:47:46Z)
PathVQA: 30000+ Questions for Medical Visual Question Answering [15.343890121216335]
This is the first dataset for pathology VQA. To our best knowledge, this is the first dataset for pathology VQA. Our dataset will be released publicly to promote research in medical VQA.
arXiv Detail & Related papers (2020-03-07T17:55:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.