Few-shot medical image classification with simple shape and texture text
descriptors using vision-language models
- URL: http://arxiv.org/abs/2308.04005v1
- Date: Tue, 8 Aug 2023 02:48:46 GMT
- Title: Few-shot medical image classification with simple shape and texture text
descriptors using vision-language models
- Authors: Michal Byra, Muhammad Febrian Rachmadi, Henrik Skibbe
- Abstract summary: We investigate the usefulness of vision-language models (VLMs) and large language models for binary few-shot classification of medical images.
We utilize the GPT-4 model to generate text descriptors that encapsulate the shape and texture characteristics of objects in medical images.
- Score: 1.1172382217477128
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this work, we investigate the usefulness of vision-language models (VLMs)
and large language models for binary few-shot classification of medical images.
We utilize the GPT-4 model to generate text descriptors that encapsulate the
shape and texture characteristics of objects in medical images. Subsequently,
these GPT-4 generated descriptors, alongside VLMs pre-trained on natural
images, are employed to classify chest X-rays and breast ultrasound images. Our
results indicate that few-shot classification of medical images using VLMs and
GPT-4 generated descriptors is a viable approach. However, accurate
classification requires to exclude certain descriptors from the calculations of
the classification scores. Moreover, we assess the ability of VLMs to evaluate
shape features in breast mass ultrasound images. We further investigate the
degree of variability among the sets of text descriptors produced by GPT-4. Our
work provides several important insights about the application of VLMs for
medical image analysis.
Related papers
- A Multimodal Approach For Endoscopic VCE Image Classification Using BiomedCLIP-PubMedBERT [0.62914438169038]
This Paper presents an advanced approach for fine-tuning BiomedCLIP PubMedBERT, a multimodal model, to classify abnormalities in Video Capsule Endoscopy frames.
Our method categorizes images into ten specific classes: angioectasia, bleeding, erosion, erythema, foreign body, lymphangiectasia, polyp, ulcer, worms, and normal.
Performance metrics, including classification, accuracy, recall, and F1 score, indicate the models strong ability to accurately identify abnormalities in endoscopic frames.
arXiv Detail & Related papers (2024-10-25T19:42:57Z) - Exploiting LMM-based knowledge for image classification tasks [11.801596051153725]
We use the MiniGPT-4 model to extract semantic descriptions for the images.
In this paper, we propose to additionally use the text encoder to obtain the text embeddings corresponding to the MiniGPT-4-generated semantic descriptions.
The experimental evaluation on three datasets validates the improved classification performance achieved by exploiting LMM-based knowledge.
arXiv Detail & Related papers (2024-06-05T08:56:24Z) - An Early Investigation into the Utility of Multimodal Large Language Models in Medical Imaging [0.3029213689620348]
We explore the potential of the Gemini (textitgemini-1.0-pro-vision-latest) and GPT-4V models for medical image analysis.
Both Gemini AI and GPT-4V are first used to classify real versus synthetic images, followed by an interpretation and analysis of the input images.
Our early investigation presented in this work provides insights into the potential of MLLMs to assist with the classification and interpretation of retinal fundoscopy and lung X-ray images.
arXiv Detail & Related papers (2024-06-02T08:29:23Z) - Holistic Evaluation of GPT-4V for Biomedical Imaging [113.46226609088194]
GPT-4V represents a breakthrough in artificial general intelligence for computer vision.
We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and more.
Results show GPT-4V's proficiency in modality and anatomy recognition but difficulty with disease diagnosis and localization.
arXiv Detail & Related papers (2023-11-10T18:40:44Z) - A ChatGPT Aided Explainable Framework for Zero-Shot Medical Image
Diagnosis [15.13309228766603]
We propose a novel CLIP-based zero-shot medical image classification framework supplemented with ChatGPT for explainable diagnosis.
The key idea is to query large language models (LLMs) with category names to automatically generate additional cues and knowledge.
Extensive results on one private dataset and four public datasets along with detailed analysis demonstrate the effectiveness and explainability of our training-free zero-shot diagnosis pipeline.
arXiv Detail & Related papers (2023-07-05T01:45:19Z) - Customizing General-Purpose Foundation Models for Medical Report
Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks.
We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z) - Cross-modulated Few-shot Image Generation for Colorectal Tissue
Classification [58.147396879490124]
Our few-shot generation method, named XM-GAN, takes one base and a pair of reference tissue images as input and generates high-quality yet diverse images.
To the best of our knowledge, we are the first to investigate few-shot generation in colorectal tissue images.
arXiv Detail & Related papers (2023-04-04T17:50:30Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Medical Image Captioning via Generative Pretrained Transformers [57.308920993032274]
We combine two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records.
The proposed model is tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO.
arXiv Detail & Related papers (2022-09-28T10:27:10Z) - Semantic segmentation of multispectral photoacoustic images using deep
learning [53.65837038435433]
Photoacoustic imaging has the potential to revolutionise healthcare.
Clinical translation of the technology requires conversion of the high-dimensional acquired data into clinically relevant and interpretable information.
We present a deep learning-based approach to semantic segmentation of multispectral photoacoustic images.
arXiv Detail & Related papers (2021-05-20T09:33:55Z) - A Bag of Visual Words Model for Medical Image Retrieval [0.9137554315375919]
Bag of Visual Words (BoVW) is a technique that can be used to effectively represent intrinsic image features in vector space.
We present a MedIR approach based on the BoVW model for content-based medical image retrieval.
arXiv Detail & Related papers (2020-07-18T16:21:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.