A reinforcement learning approach for VQA validation: an application to
diabetic macular edema grading
- URL: http://arxiv.org/abs/2307.09886v1
- Date: Wed, 19 Jul 2023 10:31:35 GMT
- Title: A reinforcement learning approach for VQA validation: an application to
diabetic macular edema grading
- Authors: Tatiana Fountoukidou and Raphael Sznitman
- Abstract summary: We focus on providing a richer and more appropriate validation approach for highly powerful Visual Question Answering (VQA) algorithms.
We propose an automatic adaptive questioning method, that aims to expose the reasoning behavior of a VQA algorithm.
Experiments show that such an agent has similar behavior to a clinician, whereby asking questions that are relevant to key clinical concepts.
- Score: 2.368995563245609
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recent advances in machine learning models have greatly increased the
performance of automated methods in medical image analysis. However, the
internal functioning of such models is largely hidden, which hinders their
integration in clinical practice. Explainability and trust are viewed as
important aspects of modern methods, for the latter's widespread use in
clinical communities. As such, validation of machine learning models represents
an important aspect and yet, most methods are only validated in a limited way.
In this work, we focus on providing a richer and more appropriate validation
approach for highly powerful Visual Question Answering (VQA) algorithms. To
better understand the performance of these methods, which answer arbitrary
questions related to images, this work focuses on an automatic visual Turing
test (VTT). That is, we propose an automatic adaptive questioning method, that
aims to expose the reasoning behavior of a VQA algorithm. Specifically, we
introduce a reinforcement learning (RL) agent that observes the history of
previously asked questions, and uses it to select the next question to pose. We
demonstrate our approach in the context of evaluating algorithms that
automatically answer questions related to diabetic macular edema (DME) grading.
The experiments show that such an agent has similar behavior to a clinician,
whereby asking questions that are relevant to key clinical concepts.
Related papers
- Efficient Bilinear Attention-based Fusion for Medical Visual Question Answering [3.983863335432589]
Medical Visual Question Answering (MedVQA) has gained increasing attention at the intersection of computer vision and natural language processing.
We introduce a novel fusion model that integrates Orthogonality loss, Multi-head attention and Bilinear Attention Network (OMniBAN) to achieve high computational efficiency and strong performance without the need for pre-training.
arXiv Detail & Related papers (2024-10-28T13:24:12Z) - PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering [56.25766322554655]
Medical Visual Question Answering (MedVQA) presents a significant opportunity to enhance diagnostic accuracy and healthcare delivery.
We propose a generative-based model for medical visual understanding by aligning visual information from a pre-trained vision encoder with a large language model.
We train the proposed model on PMC-VQA and then fine-tune it on multiple public benchmarks, e.g., VQA-RAD, SLAKE, and Image-Clef 2019.
arXiv Detail & Related papers (2023-05-17T17:50:16Z) - Improving Visual Question Answering Models through Robustness Analysis
and In-Context Learning with a Chain of Basic Questions [70.70725223310401]
This work proposes a new method that utilizes semantically related questions, referred to as basic questions, acting as noise to evaluate the robustness of VQA models.
The experimental results demonstrate that the proposed evaluation method effectively analyzes the robustness of VQA models.
arXiv Detail & Related papers (2023-04-06T15:32:35Z) - Open-Ended Medical Visual Question Answering Through Prefix Tuning of
Language Models [42.360431316298204]
We focus on open-ended VQA and motivated by the recent advances in language models consider it as a generative task.
To properly communicate the medical images to the language model, we develop a network that maps the extracted visual features to a set of learnable tokens.
We evaluate our approach on the prime medical VQA benchmarks, namely, Slake, OVQA and PathVQA.
arXiv Detail & Related papers (2023-03-10T15:17:22Z) - Morphology-Aware Interactive Keypoint Estimation [32.52024944963992]
Diagnosis based on medical images often involves manual annotation of anatomical keypoints.
We propose a novel deep neural network that automatically detects and refines the anatomical keypoints through a user-interactive system.
arXiv Detail & Related papers (2022-09-15T09:27:14Z) - Detecting Shortcut Learning for Fair Medical AI using Shortcut Testing [62.9062883851246]
Machine learning holds great promise for improving healthcare, but it is critical to ensure that its use will not propagate or amplify health disparities.
One potential driver of algorithmic unfairness, shortcut learning, arises when ML models base predictions on improper correlations in the training data.
Using multi-task learning, we propose the first method to assess and mitigate shortcut learning as a part of the fairness assessment of clinical ML systems.
arXiv Detail & Related papers (2022-07-21T09:35:38Z) - Dissecting Self-Supervised Learning Methods for Surgical Computer Vision [51.370873913181605]
Self-Supervised Learning (SSL) methods have begun to gain traction in the general computer vision community.
The effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored.
We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection.
arXiv Detail & Related papers (2022-07-01T14:17:11Z) - Consistency-preserving Visual Question Answering in Medical Imaging [2.005299372367689]
Visual Question Answering (VQA) models take an image and a natural-language question as input and infer the answer to the question.
We propose a novel loss function and corresponding training procedure that allows the inclusion of relations between questions into the training process.
Our experiments show that our method outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2022-06-27T13:38:50Z) - A Review of Uncertainty Quantification in Deep Learning: Techniques,
Applications and Challenges [76.20963684020145]
Uncertainty quantification (UQ) plays a pivotal role in reduction of uncertainties during both optimization and decision making processes.
Bizarre approximation and ensemble learning techniques are two most widely-used UQ methods in the literature.
This study reviews recent advances in UQ methods used in deep learning and investigates the application of these methods in reinforcement learning.
arXiv Detail & Related papers (2020-11-12T06:41:05Z) - Explaining Clinical Decision Support Systems in Medical Imaging using
Cycle-Consistent Activation Maximization [112.2628296775395]
Clinical decision support using deep neural networks has become a topic of steadily growing interest.
clinicians are often hesitant to adopt the technology because its underlying decision-making process is considered to be intransparent and difficult to comprehend.
We propose a novel decision explanation scheme based on CycleGAN activation which generates high-quality visualizations of classifier decisions even in smaller data sets.
arXiv Detail & Related papers (2020-10-09T14:39:27Z) - A Question-Centric Model for Visual Question Answering in Medical
Imaging [3.619444603816032]
We present a novel Visual Question Answering approach that allows an image to be queried by means of a written question.
Experiments on a variety of medical and natural image datasets show that by fusing image and question features in a novel way, the proposed approach achieves an equal or higher accuracy compared to current methods.
arXiv Detail & Related papers (2020-03-02T10:16:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.