A ChatGPT Aided Explainable Framework for Zero-Shot Medical Image
Diagnosis
- URL: http://arxiv.org/abs/2307.01981v1
- Date: Wed, 5 Jul 2023 01:45:19 GMT
- Title: A ChatGPT Aided Explainable Framework for Zero-Shot Medical Image
Diagnosis
- Authors: Jiaxiang Liu, Tianxiang Hu, Yan Zhang, Xiaotang Gai, Yang Feng, Zuozhu
Liu
- Abstract summary: We propose a novel CLIP-based zero-shot medical image classification framework supplemented with ChatGPT for explainable diagnosis.
The key idea is to query large language models (LLMs) with category names to automatically generate additional cues and knowledge.
Extensive results on one private dataset and four public datasets along with detailed analysis demonstrate the effectiveness and explainability of our training-free zero-shot diagnosis pipeline.
- Score: 15.13309228766603
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Zero-shot medical image classification is a critical process in real-world
scenarios where we have limited access to all possible diseases or large-scale
annotated data. It involves computing similarity scores between a query medical
image and possible disease categories to determine the diagnostic result.
Recent advances in pretrained vision-language models (VLMs) such as CLIP have
shown great performance for zero-shot natural image recognition and exhibit
benefits in medical applications. However, an explainable zero-shot medical
image recognition framework with promising performance is yet under
development. In this paper, we propose a novel CLIP-based zero-shot medical
image classification framework supplemented with ChatGPT for explainable
diagnosis, mimicking the diagnostic process performed by human experts. The key
idea is to query large language models (LLMs) with category names to
automatically generate additional cues and knowledge, such as disease symptoms
or descriptions other than a single category name, to help provide more
accurate and explainable diagnosis in CLIP. We further design specific prompts
to enhance the quality of generated texts by ChatGPT that describe visual
medical features. Extensive results on one private dataset and four public
datasets along with detailed analysis demonstrate the effectiveness and
explainability of our training-free zero-shot diagnosis pipeline, corroborating
the great potential of VLMs and LLMs for medical applications.
Related papers
- Clinical Evaluation of Medical Image Synthesis: A Case Study in Wireless Capsule Endoscopy [63.39037092484374]
This study focuses on the clinical evaluation of medical Synthetic Data Generation using Artificial Intelligence (AI) models.
The paper contributes by a) presenting a protocol for the systematic evaluation of synthetic images by medical experts and b) applying it to assess TIDE-II, a novel variational autoencoder-based model for high-resolution WCE image synthesis.
The results show that TIDE-II generates clinically relevant WCE images, helping to address data scarcity and enhance diagnostic tools.
arXiv Detail & Related papers (2024-10-31T19:48:50Z) - A Multimodal Approach For Endoscopic VCE Image Classification Using BiomedCLIP-PubMedBERT [0.62914438169038]
This Paper presents an advanced approach for fine-tuning BiomedCLIP PubMedBERT, a multimodal model, to classify abnormalities in Video Capsule Endoscopy frames.
Our method categorizes images into ten specific classes: angioectasia, bleeding, erosion, erythema, foreign body, lymphangiectasia, polyp, ulcer, worms, and normal.
Performance metrics, including classification, accuracy, recall, and F1 score, indicate the models strong ability to accurately identify abnormalities in endoscopic frames.
arXiv Detail & Related papers (2024-10-25T19:42:57Z) - Visual Prompt Engineering for Medical Vision Language Models in Radiology [0.1636269503300992]
Vision Language Models (VLP) offers a promising solution by leveraging learning to improve zero-shot performance classification.
In this paper, we explore the potential of visual prompt engineering to enhance the potential attention to critical regions.
arXiv Detail & Related papers (2024-08-28T13:53:27Z) - A Data-Driven Guided Decoding Mechanism for Diagnostic Captioning [11.817595076396925]
Diagnostic Captioning (DC) automatically generates a diagnostic text from one or more medical images of a patient.
We propose a new data-driven guided decoding method that incorporates medical information into the beam search of the diagnostic text generation process.
We evaluate the proposed method on two medical datasets using four DC systems that range from generic image-to-text systems with CNN encoders to pre-trained Large Language Models.
arXiv Detail & Related papers (2024-06-20T10:08:17Z) - Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework [43.453943987647015]
Medical vision language pre-training has emerged as a frontier of research, enabling zero-shot pathological recognition.
Due to the complex semantics of biomedical texts, current methods struggle to align medical images with key pathological findings in unstructured reports.
This is achieved by consulting a large language model and medical experts.
Ours improves the accuracy of recent methods by up to 8.56% and 17.26% for seen and unseen categories, respectively.
arXiv Detail & Related papers (2024-03-12T13:18:22Z) - Unlocking the Potential of Medical Imaging with ChatGPT's Intelligent
Diagnostics [2.8484009470171943]
This article aims to design a decision support system to assist healthcare providers and patients in making decisions about diagnosing, treating, and managing health conditions.
The proposed architecture contains three stages: 1) data collection and labeling, 2) model training, and 3) diagnosis report generation.
The proposed system has the potential to enhance decision-making, reduce costs, and improve the capabilities of healthcare providers.
arXiv Detail & Related papers (2023-05-12T12:52:14Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - BI-RADS-Net: An Explainable Multitask Learning Approach for Cancer
Diagnosis in Breast Ultrasound Images [69.41441138140895]
This paper introduces BI-RADS-Net, a novel explainable deep learning approach for cancer detection in breast ultrasound images.
The proposed approach incorporates tasks for explaining and classifying breast tumors, by learning feature representations relevant to clinical diagnosis.
Explanations of the predictions (benign or malignant) are provided in terms of morphological features that are used by clinicians for diagnosis and reporting in medical practice.
arXiv Detail & Related papers (2021-10-05T19:14:46Z) - Malignancy Prediction and Lesion Identification from Clinical
Dermatological Images [65.1629311281062]
We consider machine-learning-based malignancy prediction and lesion identification from clinical dermatological images.
We first identify all lesions present in the image regardless of sub-type or likelihood of malignancy, then it estimates their likelihood of malignancy, and through aggregation, it also generates an image-level likelihood of malignancy.
arXiv Detail & Related papers (2021-04-02T20:52:05Z) - Variational Knowledge Distillation for Disease Classification in Chest
X-Rays [102.04931207504173]
We propose itvariational knowledge distillation (VKD), which is a new probabilistic inference framework for disease classification based on X-rays.
We demonstrate the effectiveness of our method on three public benchmark datasets with paired X-ray images and EHRs.
arXiv Detail & Related papers (2021-03-19T14:13:56Z) - An Interpretable Multiple-Instance Approach for the Detection of
referable Diabetic Retinopathy from Fundus Images [72.94446225783697]
We propose a machine learning system for the detection of referable Diabetic Retinopathy in fundus images.
By extracting local information from image patches and combining it efficiently through an attention mechanism, our system is able to achieve high classification accuracy.
We evaluate our approach on publicly available retinal image datasets, in which it exhibits near state-of-the-art performance.
arXiv Detail & Related papers (2021-03-02T13:14:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.