Related papers: On Large Visual Language Models for Medical Imaging Analysis: An Empirical Study

On Large Visual Language Models for Medical Imaging Analysis: An Empirical Study

URL: http://arxiv.org/abs/2402.14162v1
Date: Wed, 21 Feb 2024 23:01:38 GMT
Title: On Large Visual Language Models for Medical Imaging Analysis: An Empirical Study
Authors: Minh-Hao Van, Prateek Verma, Xintao Wu
Abstract summary: Large language models (LLMs) have taken the spotlight in natural language processing. Visual language models (VLMs), such as LLaVA, Flamingo, or CLIP, have demonstrated impressive performance on various visio-linguistic tasks.
Score: 13.972931873011914
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, large language models (LLMs) have taken the spotlight in natural language processing. Further, integrating LLMs with vision enables the users to explore emergent abilities with multimodal data. Visual language models (VLMs), such as LLaVA, Flamingo, or CLIP, have demonstrated impressive performance on various visio-linguistic tasks. Consequently, there are enormous applications of large models that could be potentially used in the biomedical imaging field. Along that direction, there is a lack of related work to show the ability of large models to diagnose the diseases. In this work, we study the zero-shot and few-shot robustness of VLMs on the medical imaging analysis tasks. Our comprehensive experiments demonstrate the effectiveness of VLMs in analyzing biomedical images such as brain MRIs, microscopic images of blood cells, and chest X-rays.

Related papers

Medical Large Vision Language Models with Multi-Image Visual Ability [46.889345205047675]
We present the Med-MIM instruction dataset, comprising 83.2K medical multi-image QA pairs.<n>We fine-tune Mantis and LLaVA-Med, resulting in two specialized medical VLMs: MIM-LLaVA-Med and Med-Mantis.<n>We also develop the Med-MIM benchmark to evaluate the medical multi-image understanding capabilities of LVLMs.
arXiv Detail & Related papers (2025-05-25T08:31:22Z)
UMIT: Unifying Medical Imaging Tasks via Vision-Language Models [17.65946656129399]
UMIT is a unified multi-modal, multi-task VLM designed specifically for medical imaging tasks. It is able to solve various tasks, including visual question answering, disease detection, and medical report generation. It supports both English and Chinese, expanding its applicability globally.
arXiv Detail & Related papers (2025-03-20T06:43:36Z)
Scaling Large Vision-Language Models for Enhanced Multimodal Comprehension In Biomedical Image Analysis [0.1984949535188529]
Vision language models (VLMs) address this by incorporating a pretrained vision backbone for processing images and a cross-modal projector. We developed intelligent assistants finetuned from LLaVA models to enhance multimodal understanding in low-dose radiation therapy.
arXiv Detail & Related papers (2025-01-26T02:48:01Z)
A Survey of Medical Vision-and-Language Applications and Their Techniques [48.268198631277315]
Medical vision-and-language models (MVLMs) have attracted substantial interest due to their capability to offer a natural language interface for interpreting complex medical data. Here, we provide a comprehensive overview of MVLMs and the various medical tasks to which they have been applied. We also examine the datasets used for these tasks and compare the performance of different models based on standardized evaluation metrics.
arXiv Detail & Related papers (2024-11-19T03:27:05Z)
Parameter-Efficient Fine-Tuning Medical Multimodal Large Language Models for Medical Visual Grounding [9.144030136201476]
Multimodal large language models (MLLMs) inherit the superior text understanding capabilities of LLMs and extend these capabilities to multimodal scenarios. These models achieve excellent results in the general domain of multimodal tasks. However, in the medical domain, the substantial training costs and the requirement for extensive medical data pose challenges to the development of medical MLLMs.
arXiv Detail & Related papers (2024-10-31T11:07:26Z)
ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features [54.37042005469384]
We announce MVKL, the first multimodal mammography dataset encompassing multi-view images, detailed manifestations and reports. Based on this dataset, we focus on the challanging task of unsupervised pretraining. We propose ViKL, a framework that synergizes Visual, Knowledge, and Linguistic features.
arXiv Detail & Related papers (2024-09-24T05:01:23Z)
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks [92.03764152132315]
We design a large-scale vision-language foundation model (InternVL), which scales up the vision foundation model to 6 billion parameters. This model can be broadly applied to and achieve state-of-the-art performance on 32 generic visual-linguistic benchmarks. It has powerful visual capabilities and can be a good alternative to the ViT-22B.
arXiv Detail & Related papers (2023-12-21T18:59:31Z)
MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation [28.497591315598402]
Multimodal Large Language Models (MLLMs) have shown success in various general image processing tasks. This study investigates the potential of MLLMs in improving the understanding and generation of Chest X-Rays (CXRs)
arXiv Detail & Related papers (2023-12-04T06:40:12Z)
Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models [0.8878802873945023]
This study introduces the first systematic study on transferring Vision-Language Models to 2D medical images. Although VLSMs show competitive performance compared to image-only models for segmentation, not all VLSMs utilize the additional information from language prompts.
arXiv Detail & Related papers (2023-08-15T11:28:21Z)
OphGLM: Training an Ophthalmology Large Language-and-Vision Assistant based on Instructions and Dialogue [7.140551103766788]
We introduce visual ability into the large language model to complete the ophthalmic large language and vision assistant (OphGLM) Our experimental results demonstrate that the OphGLM model performs exceptionally well, and it has the potential to revolutionize clinical applications in ophthalmology.
arXiv Detail & Related papers (2023-06-21T11:09:48Z)
XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model. It can analyze and answer open-ended questions about chest radiographs. We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z)
Artificial General Intelligence for Medical Imaging Analysis [92.3940918983821]
Large-scale Artificial General Intelligence (AGI) models have achieved unprecedented success in a variety of general domain tasks. These models face notable challenges arising from the medical field's inherent complexities and unique characteristics. This review aims to offer insights into the future implications of AGI in medical imaging, healthcare, and beyond.
arXiv Detail & Related papers (2023-06-08T18:04:13Z)
ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using Large Language Models [53.73049253535025]
Large language models (LLMs) have recently demonstrated their potential in clinical applications. This paper presents a method for integrating LLMs into medical-image CAD networks. The goal is to merge the strengths of LLMs' medical domain knowledge and logical reasoning with the vision understanding capability of existing medical-image CAD models.
arXiv Detail & Related papers (2023-02-14T18:54:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.