On Large Visual Language Models for Medical Imaging Analysis: An
Empirical Study
- URL: http://arxiv.org/abs/2402.14162v1
- Date: Wed, 21 Feb 2024 23:01:38 GMT
- Title: On Large Visual Language Models for Medical Imaging Analysis: An
Empirical Study
- Authors: Minh-Hao Van, Prateek Verma, Xintao Wu
- Abstract summary: Large language models (LLMs) have taken the spotlight in natural language processing.
Visual language models (VLMs), such as LLaVA, Flamingo, or CLIP, have demonstrated impressive performance on various visio-linguistic tasks.
- Score: 13.972931873011914
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, large language models (LLMs) have taken the spotlight in natural
language processing. Further, integrating LLMs with vision enables the users to
explore emergent abilities with multimodal data. Visual language models (VLMs),
such as LLaVA, Flamingo, or CLIP, have demonstrated impressive performance on
various visio-linguistic tasks. Consequently, there are enormous applications
of large models that could be potentially used in the biomedical imaging field.
Along that direction, there is a lack of related work to show the ability of
large models to diagnose the diseases. In this work, we study the zero-shot and
few-shot robustness of VLMs on the medical imaging analysis tasks. Our
comprehensive experiments demonstrate the effectiveness of VLMs in analyzing
biomedical images such as brain MRIs, microscopic images of blood cells, and
chest X-rays.
Related papers
- Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs [56.391404083287235]
We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach.
Our study uses LLMs and visual instruction tuning as an interface to evaluate various visual representations.
We provide model weights, code, supporting tools, datasets, and detailed instruction-tuning and evaluation recipes.
arXiv Detail & Related papers (2024-06-24T17:59:42Z) - Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding [53.629132242389716]
Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions.
VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information.
We propose a new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge.
arXiv Detail & Related papers (2024-05-29T23:19:28Z) - Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis [12.432542525489236]
Vision language models (VLMs) have recently emerged and gained the spotlight for their ability to comprehend the dual modality of image and textual data.
In this study, we charge ChatGPT, LLaVA, Gemini, and SAM with classification, segmentation, counting, and VQA tasks on a variety of microscopy images.
We observe that ChatGPT and Gemini are impressively able to comprehend the visual features in microscopy images, while SAM is quite capable at isolating artefacts in a general sense.
arXiv Detail & Related papers (2024-05-01T21:35:04Z) - Multi-modal Auto-regressive Modeling via Visual Words [96.25078866446053]
We propose the concept of visual words, which maps the visual features to probability distributions over Large Multi-modal Models' vocabulary.
We further explore the distribution of visual features in the semantic space within LMM and the possibility of using text embeddings to represent visual information.
arXiv Detail & Related papers (2024-03-12T14:58:52Z) - InternVL: Scaling up Vision Foundation Models and Aligning for Generic
Visual-Linguistic Tasks [92.03764152132315]
We design a large-scale vision-language foundation model (InternVL), which scales up the vision foundation model to 6 billion parameters.
This model can be broadly applied to and achieve state-of-the-art performance on 32 generic visual-linguistic benchmarks.
It has powerful visual capabilities and can be a good alternative to the ViT-22B.
arXiv Detail & Related papers (2023-12-21T18:59:31Z) - MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation [28.497591315598402]
Multimodal Large Language Models (MLLMs) have shown success in various general image processing tasks.
This study investigates the potential of MLLMs in improving the understanding and generation of Chest X-Rays (CXRs)
arXiv Detail & Related papers (2023-12-04T06:40:12Z) - Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models [0.8878802873945023]
This study introduces the first systematic study on transferring Vision-Language Models to 2D medical images.
Although VLSMs show competitive performance compared to image-only models for segmentation, not all VLSMs utilize the additional information from language prompts.
arXiv Detail & Related papers (2023-08-15T11:28:21Z) - OphGLM: Training an Ophthalmology Large Language-and-Vision Assistant
based on Instructions and Dialogue [7.140551103766788]
We introduce visual ability into the large language model to complete the ophthalmic large language and vision assistant (OphGLM)
Our experimental results demonstrate that the OphGLM model performs exceptionally well, and it has the potential to revolutionize clinical applications in ophthalmology.
arXiv Detail & Related papers (2023-06-21T11:09:48Z) - XrayGPT: Chest Radiographs Summarization using Medical Vision-Language
Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model.
It can analyze and answer open-ended questions about chest radiographs.
We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z) - ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using
Large Language Models [53.73049253535025]
Large language models (LLMs) have recently demonstrated their potential in clinical applications.
This paper presents a method for integrating LLMs into medical-image CAD networks.
The goal is to merge the strengths of LLMs' medical domain knowledge and logical reasoning with the vision understanding capability of existing medical-image CAD models.
arXiv Detail & Related papers (2023-02-14T18:54:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.