Related papers: Free Form Medical Visual Question Answering in Radiology

Free Form Medical Visual Question Answering in Radiology

URL: http://arxiv.org/abs/2401.13081v1
Date: Tue, 23 Jan 2024 20:26:52 GMT
Title: Free Form Medical Visual Question Answering in Radiology
Authors: Abhishek Narayanan, Rushabh Musthyala, Rahul Sankar, Anirudh Prasad Nistala, Pranav Singh and Jacopo Cirrone
Abstract summary: Research in medical Visual Question Answering has been scant, only gaining momentum since 2018. Our research delves into the effective representation of radiology images and the joint learning of multimodal representations. Our model achieves a top-1 accuracy of 79.55% with a less complex architecture, demonstrating comparable performance to current state-of-the-art models.
Score: 3.495246564946556
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Visual Question Answering (VQA) in the medical domain presents a unique, interdisciplinary challenge, combining fields such as Computer Vision, Natural Language Processing, and Knowledge Representation. Despite its importance, research in medical VQA has been scant, only gaining momentum since 2018. Addressing this gap, our research delves into the effective representation of radiology images and the joint learning of multimodal representations, surpassing existing methods. We innovatively augment the SLAKE dataset, enabling our model to respond to a more diverse array of questions, not limited to the immediate content of radiology or pathology images. Our model achieves a top-1 accuracy of 79.55\% with a less complex architecture, demonstrating comparable performance to current state-of-the-art models. This research not only advances medical VQA but also opens avenues for practical applications in diagnostic settings.

Related papers

A Lightweight Large Vision-language Model for Multimodal Medical Images [0.06990493129893112]
Medical Visual Question Answering (VQA) enhances clinical decision-making by enabling systems to interpret medical images and answer clinical queries. We introduce a lightweight, multimodal VQA model integrating BiomedCLIP for image feature extraction and LLaMA-3 for text processing. Our results show 73.4% accuracy for open-end questions, surpassing existing models and validating its potential for real-world medical applications.
arXiv Detail & Related papers (2025-04-08T00:19:48Z)
The Era of Foundation Models in Medical Imaging is Approaching : A Scoping Review of the Clinical Value of Large-Scale Generative AI Applications in Radiology [0.0]
Social problems stemming from the shortage of radiologists are intensifying, and artificial intelligence is being highlighted as a potential solution. Recently emerging large-scale generative AI has expanded from large language models (LLMs) to multi-modal models. This scoping review systematically organizes existing literature on the clinical value of large-scale generative AI applications.
arXiv Detail & Related papers (2024-09-03T00:48:50Z)
HyperFusion: A Hypernetwork Approach to Multimodal Integration of Tabular and Medical Imaging Data for Predictive Modeling [4.44283662576491]
We present a novel framework based on hypernetworks to fuse clinical imaging and tabular data by conditioning the image processing on the EHR's values and measurements. We show that our framework outperforms both single-modality models and state-of-the-art MRI-tabular data fusion methods.
arXiv Detail & Related papers (2024-03-20T05:50:04Z)
Leveraging Foundation Models for Content-Based Medical Image Retrieval in Radiology [0.14631663747888957]
Content-based image retrieval has the potential to significantly improve diagnostic aid and medical research in radiology. Current CBIR systems face limitations due to their specialization to certain pathologies, limiting their utility. We propose using vision foundation models as powerful and versatile off-the-shelf feature extractors for content-based medical image retrieval.
arXiv Detail & Related papers (2024-03-11T10:06:45Z)
MVC: A Multi-Task Vision Transformer Network for COVID-19 Diagnosis from Chest X-ray Images [10.616065108433798]
We propose a new method, namely Multi-task Vision Transformer (MVC) for simultaneously classifying chest X-ray images and identifying affected regions from the input data. Our method is built upon the Vision Transformer but extends its learning capability in a multi-task setting.
arXiv Detail & Related papers (2023-09-30T15:52:18Z)
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z)
XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model. It can analyze and answer open-ended questions about chest radiographs. We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z)
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering [56.25766322554655]
Medical Visual Question Answering (MedVQA) presents a significant opportunity to enhance diagnostic accuracy and healthcare delivery. We propose a generative-based model for medical visual understanding by aligning visual information from a pre-trained vision encoder with a large language model. We train the proposed model on PMC-VQA and then fine-tune it on multiple public benchmarks, e.g., VQA-RAD, SLAKE, and Image-Clef 2019.
arXiv Detail & Related papers (2023-05-17T17:50:16Z)
Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining [121.89793208683625]
Medical artificial general intelligence (MAGI) enables one foundation model to solve different medical tasks. We propose a new paradigm called Medical-knedge-enhanced mulTimOdal pretRaining (MOTOR)
arXiv Detail & Related papers (2023-04-26T01:26:19Z)
Medical visual question answering using joint self-supervised learning [8.817054025763325]
The encoder embeds across the image-text dual modalities with self-attention mechanism. The decoder is connected to the top of the encoder and fine-tuned using the small-sized medical VQA dataset.
arXiv Detail & Related papers (2023-02-25T12:12:22Z)
In-Line Image Transformations for Imbalanced, Multiclass Computer Vision Classification of Lung Chest X-Rays [91.3755431537592]
This study aims to leverage a body of literature in order to apply image transformations that would serve to balance the lack of COVID-19 LCXR data. Deep learning techniques such as convolutional neural networks (CNNs) are able to select features that distinguish between healthy and disease states. This study utilizes a simple CNN architecture for high-performance multiclass LCXR classification at 94 percent accuracy.
arXiv Detail & Related papers (2021-04-06T02:01:43Z)
Universal Model for Multi-Domain Medical Image Retrieval [88.67940265012638]
Medical Image Retrieval (MIR) helps doctors quickly find similar patients' data. MIR is becoming increasingly helpful due to the wide use of digital imaging modalities. However, the popularity of various digital imaging modalities in hospitals also poses several challenges to MIR.
arXiv Detail & Related papers (2020-07-14T23:22:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.