Free Form Medical Visual Question Answering in Radiology
- URL: http://arxiv.org/abs/2401.13081v1
- Date: Tue, 23 Jan 2024 20:26:52 GMT
- Title: Free Form Medical Visual Question Answering in Radiology
- Authors: Abhishek Narayanan, Rushabh Musthyala, Rahul Sankar, Anirudh Prasad
Nistala, Pranav Singh and Jacopo Cirrone
- Abstract summary: Research in medical Visual Question Answering has been scant, only gaining momentum since 2018.
Our research delves into the effective representation of radiology images and the joint learning of multimodal representations.
Our model achieves a top-1 accuracy of 79.55% with a less complex architecture, demonstrating comparable performance to current state-of-the-art models.
- Score: 3.495246564946556
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual Question Answering (VQA) in the medical domain presents a unique,
interdisciplinary challenge, combining fields such as Computer Vision, Natural
Language Processing, and Knowledge Representation. Despite its importance,
research in medical VQA has been scant, only gaining momentum since 2018.
Addressing this gap, our research delves into the effective representation of
radiology images and the joint learning of multimodal representations,
surpassing existing methods. We innovatively augment the SLAKE dataset,
enabling our model to respond to a more diverse array of questions, not limited
to the immediate content of radiology or pathology images. Our model achieves a
top-1 accuracy of 79.55\% with a less complex architecture, demonstrating
comparable performance to current state-of-the-art models. This research not
only advances medical VQA but also opens avenues for practical applications in
diagnostic settings.
Related papers
- HyperFusion: A Hypernetwork Approach to Multimodal Integration of Tabular and Medical Imaging Data for Predictive Modeling [4.44283662576491]
We present a novel framework based on hypernetworks to fuse clinical imaging and tabular data by conditioning the image processing on the EHR's values and measurements.
We show that our framework outperforms both single-modality models and state-of-the-art MRI-tabular data fusion methods.
arXiv Detail & Related papers (2024-03-20T05:50:04Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - Leveraging Foundation Models for Content-Based Medical Image Retrieval in Radiology [0.14631663747888957]
Content-based image retrieval has the potential to significantly improve diagnostic aid and medical research in radiology.
Current CBIR systems face limitations due to their specialization to certain pathologies, limiting their utility.
We propose using vision foundation models as powerful and versatile off-the-shelf feature extractors for content-based medical image retrieval.
arXiv Detail & Related papers (2024-03-11T10:06:45Z) - MVC: A Multi-Task Vision Transformer Network for COVID-19 Diagnosis from
Chest X-ray Images [10.616065108433798]
We propose a new method, namely Multi-task Vision Transformer (MVC) for simultaneously classifying chest X-ray images and identifying affected regions from the input data.
Our method is built upon the Vision Transformer but extends its learning capability in a multi-task setting.
arXiv Detail & Related papers (2023-09-30T15:52:18Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - XrayGPT: Chest Radiographs Summarization using Medical Vision-Language
Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model.
It can analyze and answer open-ended questions about chest radiographs.
We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z) - Towards Medical Artificial General Intelligence via Knowledge-Enhanced
Multimodal Pretraining [121.89793208683625]
Medical artificial general intelligence (MAGI) enables one foundation model to solve different medical tasks.
We propose a new paradigm called Medical-knedge-enhanced mulTimOdal pretRaining (MOTOR)
arXiv Detail & Related papers (2023-04-26T01:26:19Z) - Medical visual question answering using joint self-supervised learning [8.817054025763325]
The encoder embeds across the image-text dual modalities with self-attention mechanism.
The decoder is connected to the top of the encoder and fine-tuned using the small-sized medical VQA dataset.
arXiv Detail & Related papers (2023-02-25T12:12:22Z) - In-Line Image Transformations for Imbalanced, Multiclass Computer Vision
Classification of Lung Chest X-Rays [91.3755431537592]
This study aims to leverage a body of literature in order to apply image transformations that would serve to balance the lack of COVID-19 LCXR data.
Deep learning techniques such as convolutional neural networks (CNNs) are able to select features that distinguish between healthy and disease states.
This study utilizes a simple CNN architecture for high-performance multiclass LCXR classification at 94 percent accuracy.
arXiv Detail & Related papers (2021-04-06T02:01:43Z) - Universal Model for Multi-Domain Medical Image Retrieval [88.67940265012638]
Medical Image Retrieval (MIR) helps doctors quickly find similar patients' data.
MIR is becoming increasingly helpful due to the wide use of digital imaging modalities.
However, the popularity of various digital imaging modalities in hospitals also poses several challenges to MIR.
arXiv Detail & Related papers (2020-07-14T23:22:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.