A Dataset for Medical Instructional Video Classification and Question
Answering
- URL: http://arxiv.org/abs/2201.12888v1
- Date: Sun, 30 Jan 2022 18:06:31 GMT
- Title: A Dataset for Medical Instructional Video Classification and Question
Answering
- Authors: Deepak Gupta, Kush Attal, and Dina Demner-Fushman
- Abstract summary: This paper introduces a new challenge and datasets to foster research toward designing systems that can understand medical videos.
We believe medical videos may provide the best possible answers to many first aids, medical emergency, and medical education questions.
We have benchmarked each task with the created MedVidCL and MedVidQA datasets and proposed the multimodal learning methods.
- Score: 16.748852458926162
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This paper introduces a new challenge and datasets to foster research toward
designing systems that can understand medical videos and provide visual answers
to natural language questions. We believe medical videos may provide the best
possible answers to many first aids, medical emergency, and medical education
questions. Toward this, we created the MedVidCL and MedVidQA datasets and
introduce the tasks of Medical Video Classification (MVC) and Medical Visual
Answer Localization (MVAL), two tasks that focus on cross-modal (medical
language and medical video) understanding. The proposed tasks and datasets have
the potential to support the development of sophisticated downstream
applications that can benefit the public and medical practitioners. Our
datasets consist of 6,117 annotated videos for the MVC task and 3,010 annotated
questions and answers timestamps from 899 videos for the MVAL task. These
datasets have been verified and corrected by medical informatics experts. We
have also benchmarked each task with the created MedVidCL and MedVidQA datasets
and proposed the multimodal learning methods that set competitive baselines for
future research.
Related papers
- FEDMEKI: A Benchmark for Scaling Medical Foundation Models via Federated Knowledge Injection [83.54960238236548]
FEDMEKI not only preserves data privacy but also enhances the capability of medical foundation models.
FEDMEKI allows medical foundation models to learn from a broader spectrum of medical knowledge without direct data exposure.
arXiv Detail & Related papers (2024-08-17T15:18:56Z) - MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English
Clinical Queries [16.101969130235055]
We introduce the Multimodal Medical Codemixed Question Summarization MMCQS dataset.
This dataset combines Hindi-English codemixed medical queries with visual aids.
Our dataset, code, and pre-trained models will be made publicly available.
arXiv Detail & Related papers (2024-01-03T07:58:25Z) - Towards Answering Health-related Questions from Medical Videos: Datasets
and Approaches [21.16331827504689]
A growing number of individuals now prefer instructional videos as they offer a series of step-by-step procedures to accomplish particular tasks.
The instructional videos from the medical domain may provide the best possible visual answers to first aid, medical emergency, and medical education questions.
The scarcity of large-scale datasets in the medical domain is a key challenge that hinders the development of applications that can help the public with their health-related questions.
arXiv Detail & Related papers (2023-09-21T16:21:28Z) - Med-Flamingo: a Multimodal Medical Few-shot Learner [58.85676013818811]
We propose Med-Flamingo, a multimodal few-shot learner adapted to the medical domain.
Based on OpenFlamingo-9B, we continue pre-training on paired and interleaved medical image-text data from publications and textbooks.
We conduct the first human evaluation for generative medical VQA where physicians review the problems and blinded generations in an interactive app.
arXiv Detail & Related papers (2023-07-27T20:36:02Z) - LLaVA-Med: Training a Large Language-and-Vision Assistant for
Biomedicine in One Day [85.19963303642427]
We propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images.
The model first learns to align biomedical vocabulary using the figure-caption pairs as is, then learns to master open-ended conversational semantics.
This enables us to train a Large Language and Vision Assistant for BioMedicine in less than 15 hours (with eight A100s)
arXiv Detail & Related papers (2023-06-01T16:50:07Z) - PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering [56.25766322554655]
Medical Visual Question Answering (MedVQA) presents a significant opportunity to enhance diagnostic accuracy and healthcare delivery.
We propose a generative-based model for medical visual understanding by aligning visual information from a pre-trained vision encoder with a large language model.
We train the proposed model on PMC-VQA and then fine-tune it on multiple public benchmarks, e.g., VQA-RAD, SLAKE, and Image-Clef 2019.
arXiv Detail & Related papers (2023-05-17T17:50:16Z) - ViMQ: A Vietnamese Medical Question Dataset for Healthcare Dialogue
System Development [1.4315915057750197]
We publish a Vietnamese dataset of medical questions from patients with sentence-level and entity-level annotations.
We propose a simple self-supervised training strategy with span-noise modelling that improves the performance.
arXiv Detail & Related papers (2023-04-27T17:59:53Z) - Towards Medical Artificial General Intelligence via Knowledge-Enhanced
Multimodal Pretraining [121.89793208683625]
Medical artificial general intelligence (MAGI) enables one foundation model to solve different medical tasks.
We propose a new paradigm called Medical-knedge-enhanced mulTimOdal pretRaining (MOTOR)
arXiv Detail & Related papers (2023-04-26T01:26:19Z) - Medical Visual Question Answering: A Survey [55.53205317089564]
Medical Visual Question Answering(VQA) is a combination of medical artificial intelligence and popular VQA challenges.
Given a medical image and a clinically relevant question in natural language, the medical VQA system is expected to predict a plausible and convincing answer.
arXiv Detail & Related papers (2021-11-19T05:55:15Z) - SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical
Visual Question Answering [29.496389523654596]
We present a large bilingual dataset, SLAKE, with comprehensive semantic labels annotated by experienced physicians.
Besides, SLAKE includes richer modalities and covers more human body parts than the currently available dataset.
arXiv Detail & Related papers (2021-02-18T18:44:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.