SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical
Visual Question Answering
- URL: http://arxiv.org/abs/2102.09542v1
- Date: Thu, 18 Feb 2021 18:44:50 GMT
- Title: SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical
Visual Question Answering
- Authors: Bo Liu, Li-Ming Zhan, Li Xu, Lin Ma, Yan Yang, Xiao-Ming Wu
- Abstract summary: We present a large bilingual dataset, SLAKE, with comprehensive semantic labels annotated by experienced physicians.
Besides, SLAKE includes richer modalities and covers more human body parts than the currently available dataset.
- Score: 29.496389523654596
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Medical visual question answering (Med-VQA) has tremendous potential in
healthcare. However, the development of this technology is hindered by the
lacking of publicly-available and high-quality labeled datasets for training
and evaluation. In this paper, we present a large bilingual dataset, SLAKE,
with comprehensive semantic labels annotated by experienced physicians and a
new structural medical knowledge base for Med-VQA. Besides, SLAKE includes
richer modalities and covers more human body parts than the currently available
dataset. We show that SLAKE can be used to facilitate the development and
evaluation of Med-VQA systems. The dataset can be downloaded from
http://www.med-vqa.com/slake.
Related papers
- STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical Question-Answering [58.79671189792399]
STLLaVA-Med is designed to train a policy model capable of auto-generating medical visual instruction data.
We validate the efficacy and data efficiency of STLLaVA-Med across three major medical Visual Question Answering (VQA) benchmarks.
arXiv Detail & Related papers (2024-06-28T15:01:23Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English
Clinical Queries [16.101969130235055]
We introduce the Multimodal Medical Codemixed Question Summarization MMCQS dataset.
This dataset combines Hindi-English codemixed medical queries with visual aids.
Our dataset, code, and pre-trained models will be made publicly available.
arXiv Detail & Related papers (2024-01-03T07:58:25Z) - BESTMVQA: A Benchmark Evaluation System for Medical Visual Question
Answering [8.547600133510551]
This paper develops a Benchmark Evaluation SysTem for Medical Visual Question Answering, denoted by BESTMVQA.
Our system provides a useful tool for users to automatically build Med-VQA datasets, which helps overcoming the data insufficient problem.
With simple configurations, our system automatically trains and evaluates the selected models over a benchmark dataset.
arXiv Detail & Related papers (2023-12-13T03:08:48Z) - Med-Flamingo: a Multimodal Medical Few-shot Learner [58.85676013818811]
We propose Med-Flamingo, a multimodal few-shot learner adapted to the medical domain.
Based on OpenFlamingo-9B, we continue pre-training on paired and interleaved medical image-text data from publications and textbooks.
We conduct the first human evaluation for generative medical VQA where physicians review the problems and blinded generations in an interactive app.
arXiv Detail & Related papers (2023-07-27T20:36:02Z) - PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering [56.25766322554655]
Medical Visual Question Answering (MedVQA) presents a significant opportunity to enhance diagnostic accuracy and healthcare delivery.
We propose a generative-based model for medical visual understanding by aligning visual information from a pre-trained vision encoder with a large language model.
We train the proposed model on PMC-VQA and then fine-tune it on multiple public benchmarks, e.g., VQA-RAD, SLAKE, and Image-Clef 2019.
arXiv Detail & Related papers (2023-05-17T17:50:16Z) - A Dataset for Medical Instructional Video Classification and Question
Answering [16.748852458926162]
This paper introduces a new challenge and datasets to foster research toward designing systems that can understand medical videos.
We believe medical videos may provide the best possible answers to many first aids, medical emergency, and medical education questions.
We have benchmarked each task with the created MedVidCL and MedVidQA datasets and proposed the multimodal learning methods.
arXiv Detail & Related papers (2022-01-30T18:06:31Z) - Medical Visual Question Answering: A Survey [55.53205317089564]
Medical Visual Question Answering(VQA) is a combination of medical artificial intelligence and popular VQA challenges.
Given a medical image and a clinically relevant question in natural language, the medical VQA system is expected to predict a plausible and convincing answer.
arXiv Detail & Related papers (2021-11-19T05:55:15Z) - Knowledge-Aware Neural Networks for Medical Forum Question
Classification [13.22396257705293]
We develop a medical knowledge-aware BERT-based model (MedBERT) that gives more weightage to medical concept-bearing words.
We also contribute a multi-label dataset for the Medical Forum Question Classification (MFQC) task.
arXiv Detail & Related papers (2021-09-27T15:57:21Z) - MedDG: An Entity-Centric Medical Consultation Dataset for Entity-Aware
Medical Dialogue Generation [86.38736781043109]
We build and release a large-scale high-quality Medical Dialogue dataset related to 12 types of common Gastrointestinal diseases named MedDG.
We propose two kinds of medical dialogue tasks based on MedDG dataset. One is the next entity prediction and the other is the doctor response generation.
Experimental results show that the pre-train language models and other baselines struggle on both tasks with poor performance in our dataset.
arXiv Detail & Related papers (2020-10-15T03:34:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.