UIT-Saviors at MEDVQA-GI 2023: Improving Multimodal Learning with Image
Enhancement for Gastrointestinal Visual Question Answering
- URL: http://arxiv.org/abs/2307.02783v2
- Date: Mon, 20 Nov 2023 03:15:27 GMT
- Title: UIT-Saviors at MEDVQA-GI 2023: Improving Multimodal Learning with Image
Enhancement for Gastrointestinal Visual Question Answering
- Authors: Triet M. Thai, Anh T. Vo, Hao K. Tieu, Linh N.P. Bui, Thien T.B.
Nguyen
- Abstract summary: The ImageCLEFmed-MEDVQA-GI-2023 challenge carried out visual question answering task in the gastrointestinal domain.
multimodal architecture is set up with BERT encoder and different pre-trained vision models based on convolutional neural network (CNN) and Transformer architecture.
Our best method, which takes advantages of BERT+BEiT fusion and image enhancement, achieves up to 87.25% accuracy and 91.85% F1-Score.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, artificial intelligence has played an important role in
medicine and disease diagnosis, with many applications to be mentioned, one of
which is Medical Visual Question Answering (MedVQA). By combining computer
vision and natural language processing, MedVQA systems can assist experts in
extracting relevant information from medical image based on a given question
and providing precise diagnostic answers. The ImageCLEFmed-MEDVQA-GI-2023
challenge carried out visual question answering task in the gastrointestinal
domain, which includes gastroscopy and colonoscopy images. Our team approached
Task 1 of the challenge by proposing a multimodal learning method with image
enhancement to improve the VQA performance on gastrointestinal images. The
multimodal architecture is set up with BERT encoder and different pre-trained
vision models based on convolutional neural network (CNN) and Transformer
architecture for features extraction from question and endoscopy image. The
result of this study highlights the dominance of Transformer-based vision
models over the CNNs and demonstrates the effectiveness of the image
enhancement process, with six out of the eight vision models achieving better
F1-Score. Our best method, which takes advantages of BERT+BEiT fusion and image
enhancement, achieves up to 87.25% accuracy and 91.85% F1-Score on the
development test set, while also producing good result on the private test set
with accuracy of 82.01%.
Related papers
- ClinKD: Cross-Modal Clinic Knowledge Distiller For Multi-Task Medical Images [4.353855760968461]
Med-VQA (Medical Visual Question Answering) is a crucial subtask within the broader VQA (Visual Question Answering) domain.
We introduce the ClinKD model, which incorporates modifications to model position encoding and a diversified training process.
We achieve a new state-of-the-art performance on the Med-GRIT-270k dataset.
arXiv Detail & Related papers (2025-02-09T15:08:10Z) - Domain-Adaptive Pre-training of Self-Supervised Foundation Models for Medical Image Classification in Gastrointestinal Endoscopy [0.024999074238880488]
Video capsule endoscopy has transformed gastrointestinal endoscopy (GIE) diagnostics by offering a non-invasive method for capturing detailed images of the gastrointestinal tract.
Video capsule endoscopy has transformed gastrointestinal endoscopy (GIE) diagnostics by offering a non-invasive method for capturing detailed images of the gastrointestinal tract.
However, its potential is limited by the sheer volume of images generated during the imaging procedure, which can take anywhere from 6-8 hours and often produce up to 1 million images.
arXiv Detail & Related papers (2024-10-21T22:52:25Z) - QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge [93.61262892578067]
Uncertainty in medical image segmentation tasks, especially inter-rater variability, presents a significant challenge.
This variability directly impacts the development and evaluation of automated segmentation algorithms.
We report the set-up and summarize the benchmark results of the Quantification of Uncertainties in Biomedical Image Quantification Challenge (QUBIQ)
arXiv Detail & Related papers (2024-03-19T17:57:24Z) - Free Form Medical Visual Question Answering in Radiology [3.495246564946556]
Research in medical Visual Question Answering has been scant, only gaining momentum since 2018.
Our research delves into the effective representation of radiology images and the joint learning of multimodal representations.
Our model achieves a top-1 accuracy of 79.55% with a less complex architecture, demonstrating comparable performance to current state-of-the-art models.
arXiv Detail & Related papers (2024-01-23T20:26:52Z) - MVC: A Multi-Task Vision Transformer Network for COVID-19 Diagnosis from
Chest X-ray Images [10.616065108433798]
We propose a new method, namely Multi-task Vision Transformer (MVC) for simultaneously classifying chest X-ray images and identifying affected regions from the input data.
Our method is built upon the Vision Transformer but extends its learning capability in a multi-task setting.
arXiv Detail & Related papers (2023-09-30T15:52:18Z) - PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering [56.25766322554655]
Medical Visual Question Answering (MedVQA) presents a significant opportunity to enhance diagnostic accuracy and healthcare delivery.
We propose a generative-based model for medical visual understanding by aligning visual information from a pre-trained vision encoder with a large language model.
We train the proposed model on PMC-VQA and then fine-tune it on multiple public benchmarks, e.g., VQA-RAD, SLAKE, and Image-Clef 2019.
arXiv Detail & Related papers (2023-05-17T17:50:16Z) - COVID-Net USPro: An Open-Source Explainable Few-Shot Deep Prototypical
Network to Monitor and Detect COVID-19 Infection from Point-of-Care
Ultrasound Images [66.63200823918429]
COVID-Net USPro monitors and detects COVID-19 positive cases with high precision and recall from minimal ultrasound images.
The network achieves 99.65% overall accuracy, 99.7% recall and 99.67% precision for COVID-19 positive cases when trained with only 5 shots.
arXiv Detail & Related papers (2023-01-04T16:05:51Z) - Towards Trustworthy Healthcare AI: Attention-Based Feature Learning for
COVID-19 Screening With Chest Radiography [70.37371604119826]
Building AI models with trustworthiness is important especially in regulated areas such as healthcare.
Previous work uses convolutional neural networks as the backbone architecture, which has shown to be prone to over-caution and overconfidence in making decisions.
We propose a feature learning approach using Vision Transformers, which use an attention-based mechanism.
arXiv Detail & Related papers (2022-07-19T14:55:42Z) - FetReg2021: A Challenge on Placental Vessel Segmentation and
Registration in Fetoscopy [52.3219875147181]
Fetoscopic laser photocoagulation is a widely adopted procedure for treating Twin-to-Twin Transfusion Syndrome (TTTS)
The procedure is particularly challenging due to the limited field of view, poor manoeuvrability of the fetoscope, poor visibility, and variability in illumination.
Computer-assisted intervention (CAI) can provide surgeons with decision support and context awareness by identifying key structures in the scene and expanding the fetoscopic field of view through video mosaicking.
Seven teams participated in this challenge and their model performance was assessed on an unseen test dataset of 658 pixel-annotated images from 6 fet
arXiv Detail & Related papers (2022-06-24T23:44:42Z) - Self-supervised Learning from 100 Million Medical Images [13.958840691105992]
We propose a method for self-supervised learning of rich image features based on contrastive learning and online feature clustering.
We leverage large training datasets of over 100,000,000 medical images of various modalities, including radiography, computed tomography (CT), magnetic resonance (MR) imaging and ultrasonography.
We highlight a number of advantages of this strategy on challenging image assessment problems in radiography, CT and MR.
arXiv Detail & Related papers (2022-01-04T18:27:04Z) - Generative Adversarial U-Net for Domain-free Medical Image Augmentation [49.72048151146307]
The shortage of annotated medical images is one of the biggest challenges in the field of medical image computing.
In this paper, we develop a novel generative method named generative adversarial U-Net.
Our newly designed model is domain-free and generalizable to various medical images.
arXiv Detail & Related papers (2021-01-12T23:02:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.