Medical Visual Question Answering: A Survey
- URL: http://arxiv.org/abs/2111.10056v3
- Date: Wed, 7 Jun 2023 00:37:15 GMT
- Title: Medical Visual Question Answering: A Survey
- Authors: Zhihong Lin, Donghao Zhang, Qingyi Tao, Danli Shi, Gholamreza Haffari,
Qi Wu, Mingguang He, and Zongyuan Ge
- Abstract summary: Medical Visual Question Answering(VQA) is a combination of medical artificial intelligence and popular VQA challenges.
Given a medical image and a clinically relevant question in natural language, the medical VQA system is expected to predict a plausible and convincing answer.
- Score: 55.53205317089564
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Medical Visual Question Answering~(VQA) is a combination of medical
artificial intelligence and popular VQA challenges. Given a medical image and a
clinically relevant question in natural language, the medical VQA system is
expected to predict a plausible and convincing answer. Although the
general-domain VQA has been extensively studied, the medical VQA still needs
specific investigation and exploration due to its task features. In the first
part of this survey, we collect and discuss the publicly available medical VQA
datasets up-to-date about the data source, data quantity, and task feature. In
the second part, we review the approaches used in medical VQA tasks. We
summarize and discuss their techniques, innovations, and potential
improvements. In the last part, we analyze some medical-specific challenges for
the field and discuss future research directions. Our goal is to provide
comprehensive and helpful information for researchers interested in the medical
visual question answering field and encourage them to conduct further research
in this field.
Related papers
- Tri-VQA: Triangular Reasoning Medical Visual Question Answering for Multi-Attribute Analysis [4.964280449393689]
We investigate the construction of a more cohesive and stable Med-VQA structure.
Motivated by causal effect, we propose a novel Triangular Reasoning VQA framework.
arXiv Detail & Related papers (2024-06-21T10:50:55Z) - From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities [2.0681376988193843]
The work presents a survey in the domain of Visual Question Answering (VQA) that delves into the intricacies of VQA datasets and methods over the field's history.
We further generalize VQA to multimodal question answering, explore tasks related to VQA, and present a set of open problems for future investigation.
arXiv Detail & Related papers (2023-11-01T05:39:41Z) - Visual Question Answering in the Medical Domain [13.673890873313354]
We present a novel contrastive learning pretraining method to mitigate the problem of small datasets for the Med-VQA task.
Our proposed model obtained an accuracy of 60% on the VQA-Med 2019 test set, giving comparable results to other state-of-the-art Med-VQA models.
arXiv Detail & Related papers (2023-09-20T06:06:10Z) - PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering [56.25766322554655]
Medical Visual Question Answering (MedVQA) presents a significant opportunity to enhance diagnostic accuracy and healthcare delivery.
We propose a generative-based model for medical visual understanding by aligning visual information from a pre-trained vision encoder with a large language model.
We train the proposed model on PMC-VQA and then fine-tune it on multiple public benchmarks, e.g., VQA-RAD, SLAKE, and Image-Clef 2019.
arXiv Detail & Related papers (2023-05-17T17:50:16Z) - Interpretable Medical Image Visual Question Answering via Multi-Modal
Relationship Graph Learning [45.746882253686856]
Medical visual question answering (VQA) aims to answer clinically relevant questions regarding input medical images.
We first collected a comprehensive and large-scale medical VQA dataset, focusing on chest X-ray images.
Based on this dataset, we also propose a novel baseline method by constructing three different relationship graphs.
arXiv Detail & Related papers (2023-02-19T17:46:16Z) - Video Question Answering: Datasets, Algorithms and Challenges [99.9179674610955]
Video Question Answering (VideoQA) aims to answer natural language questions according to the given videos.
This paper provides a clear taxonomy and comprehensive analyses to VideoQA, focusing on the datasets, algorithms, and unique challenges.
arXiv Detail & Related papers (2022-03-02T16:34:09Z) - Achieving Human Parity on Visual Question Answering [67.22500027651509]
The Visual Question Answering (VQA) task utilizes both visual image and language analysis to answer a textual question with respect to an image.
This paper describes our recent research of AliceMind-MMU that obtains similar or even slightly better results than human beings does on VQA.
This is achieved by systematically improving the VQA pipeline including: (1) pre-training with comprehensive visual and textual feature representation; (2) effective cross-modal interaction with learning to attend; and (3) A novel knowledge mining framework with specialized expert modules for the complex VQA task.
arXiv Detail & Related papers (2021-11-17T04:25:11Z) - A survey on VQA_Datasets and Approaches [0.0]
Visual question answering (VQA) is a task that combines the techniques of computer vision and natural language processing.
This paper will review and analyze existing datasets, metrics, and models proposed for the VQA task.
arXiv Detail & Related papers (2021-05-02T08:50:30Z) - Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex
Healthcare Question Answering [89.76059961309453]
HeadQA dataset contains multiple-choice questions authorized for the public healthcare specialization exam.
These questions are the most challenging for current QA systems.
We present a Multi-step reasoning with Knowledge extraction framework (MurKe)
We are striving to make full use of off-the-shelf pre-trained models.
arXiv Detail & Related papers (2020-08-06T02:47:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.