Exploring the Versatility of Zero-Shot CLIP for Interstitial Lung
Disease Classification
- URL: http://arxiv.org/abs/2306.01111v2
- Date: Tue, 12 Sep 2023 20:58:35 GMT
- Title: Exploring the Versatility of Zero-Shot CLIP for Interstitial Lung
Disease Classification
- Authors: Cara Van Uden and Christian Bluethgen and Maayane Attias and
Malgorzata Polacin and Haiwei Henry Guo and Neha Simha and Rishi Raj and
Curtis Langlotz
- Abstract summary: We propose a machine learning approach that utilizes CLIP, a multimodal (image and text) self-supervised model, for ILD classification.
We extensively integrate zero-shot CLIP throughout our workflow, starting from the initial extraction of image patches from volumetric CT scans.
We achieve strong zero-shot ILD classification results, including an AUROC of 0.893, without the need for any labeled training data.
- Score: 0.36646002427839136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Interstitial lung diseases (ILD) present diagnostic challenges due to their
varied manifestations and overlapping imaging features. To address this, we
propose a machine learning approach that utilizes CLIP, a multimodal (image and
text) self-supervised model, for ILD classification. We extensively integrate
zero-shot CLIP throughout our workflow, starting from the initial extraction of
image patches from volumetric CT scans and proceeding to ILD classification
using "patch montages". Furthermore, we investigate how domain adaptive
pretraining (DAPT) CLIP with task-specific images (CT "patch montages"
extracted with ILD-specific prompts for CLIP) and/or text (lung-specific
sections of radiology reports) affects downstream ILD classification
performance. By leveraging CLIP-extracted "patch montages" and DAPT, we achieve
strong zero-shot ILD classification results, including an AUROC of 0.893,
without the need for any labeled training data. This work highlights the
versatility and potential of multimodal models like CLIP for medical image
classification tasks where labeled data is scarce.
Related papers
- KPL: Training-Free Medical Knowledge Mining of Vision-Language Models [38.85906425979443]
The Knowledge Proxy Learning (KPL) is designed to leverage CLIP's multimodal understandings for medical image classification.
KPL retrieves image-relevant knowledge descriptions from the constructed knowledge-enhanced base to enrich semantic text proxies.
It then harnesses input images and these descriptions, encoded via CLIP, to stably generate multimodal proxies that boost the zero-shot classification performance.
arXiv Detail & Related papers (2025-01-20T02:31:00Z) - A Multimodal Approach For Endoscopic VCE Image Classification Using BiomedCLIP-PubMedBERT [0.62914438169038]
This Paper presents an advanced approach for fine-tuning BiomedCLIP PubMedBERT, a multimodal model, to classify abnormalities in Video Capsule Endoscopy frames.
Our method categorizes images into ten specific classes: angioectasia, bleeding, erosion, erythema, foreign body, lymphangiectasia, polyp, ulcer, worms, and normal.
Performance metrics, including classification, accuracy, recall, and F1 score, indicate the models strong ability to accurately identify abnormalities in endoscopic frames.
arXiv Detail & Related papers (2024-10-25T19:42:57Z) - PRISM: A Multi-Modal Generative Foundation Model for Slide-Level Histopathology [9.556246087301883]
We present a slide-level foundation model for H&E-stained histopathology, PRISM, that builds on Virchow tile embeddings.
PRISM produces slide-level embeddings with the ability to generate clinical reports, resulting in several modes of use.
Using text prompts, PRISM achieves zero-shot cancer detection and sub-typing performance approaching that of a supervised aggregator model.
arXiv Detail & Related papers (2024-05-16T16:59:12Z) - A Classification-Based Adaptive Segmentation Pipeline: Feasibility Study Using Polycystic Liver Disease and Metastases from Colorectal Cancer CT Images [0.261201916989931]
The purpose of this study is to explore the feasibility of building a workflow to efficiently trained segmentation models.
By implementing a deep learning model to automatically classify the images and route them appropriate segmentation models, we hope our workflow can segment the images with different pathology accurately.
arXiv Detail & Related papers (2024-05-02T18:05:37Z) - CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement [65.47237619200442]
Contrastive language image pretraining (CLIP) is a standard method for training vision-language models.
We augment CLIP training with task-specific vision models from model zoos to improve its visual representations.
This simple setup shows substantial improvements of up to 16.3% across different vision tasks.
arXiv Detail & Related papers (2023-10-21T20:20:13Z) - Disruptive Autoencoders: Leveraging Low-level features for 3D Medical
Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images.
We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations.
The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Contrastive Centroid Supervision Alleviates Domain Shift in Medical
Image Classification [9.709678461254972]
Feature Centroid Contrast Learning (FCCL) can improve target domain classification performance by extra supervision during training.
We verify through extensive experiments that FCCL can achieve superior performance on at least three imaging modalities.
arXiv Detail & Related papers (2022-05-31T09:54:17Z) - Improving Classification Model Performance on Chest X-Rays through Lung
Segmentation [63.45024974079371]
We propose a deep learning approach to enhance abnormal chest x-ray (CXR) identification performance through segmentations.
Our approach is designed in a cascaded manner and incorporates two modules: a deep neural network with criss-cross attention modules (XLSor) for localizing lung region in CXR images and a CXR classification model with a backbone of a self-supervised momentum contrast (MoCo) model pre-trained on large-scale CXR data sets.
arXiv Detail & Related papers (2022-02-22T15:24:06Z) - Colorectal Polyp Classification from White-light Colonoscopy Images via
Domain Alignment [57.419727894848485]
A computer-aided diagnosis system is required to assist accurate diagnosis from colonoscopy images.
Most previous studies at-tempt to develop models for polyp differentiation using Narrow-Band Imaging (NBI) or other enhanced images.
We propose a novel framework based on a teacher-student architecture for the accurate colorectal polyp classification.
arXiv Detail & Related papers (2021-08-05T09:31:46Z) - Few-shot Medical Image Segmentation using a Global Correlation Network
with Discriminative Embedding [60.89561661441736]
We propose a novel method for few-shot medical image segmentation.
We construct our few-shot image segmentor using a deep convolutional network trained episodically.
We enhance discriminability of deep embedding to encourage clustering of the feature domains of the same class.
arXiv Detail & Related papers (2020-12-10T04:01:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.