MedFocusCLIP : Improving few shot classification in medical datasets using pixel wise attention
- URL: http://arxiv.org/abs/2501.03839v1
- Date: Tue, 07 Jan 2025 14:49:12 GMT
- Title: MedFocusCLIP : Improving few shot classification in medical datasets using pixel wise attention
- Authors: Aadya Arora, Vinay Namboodiri,
- Abstract summary: We propose to leverage advanced segmentation capabilities of Segment Anything Model 2 (SAM2) as a visual prompting cue to help visual encoder in the CLIP (Contrastive Language-Image Pretraining)
This helps the model to focus on highly discriminative regions, without getting distracted from visually similar background features.
We evaluate our method on diverse medical datasets including X-rays, CT scans, and MRI images, and report an accuracy of (71%, 81%, 86%, 58%) from the proposed approach.
- Score: 1.2277343096128712
- License:
- Abstract: With the popularity of foundational models, parameter efficient fine tuning has become the defacto approach to leverage pretrained models to perform downstream tasks. Taking inspiration from recent advances in large language models, Visual Prompt Tuning, and similar techniques, learn an additional prompt to efficiently finetune a pretrained vision foundational model. However, we observe that such prompting is insufficient for fine-grained visual classification tasks such as medical image classification, where there is large inter-class variance, and small intra-class variance. Hence, in this paper we propose to leverage advanced segmentation capabilities of Segment Anything Model 2 (SAM2) as a visual prompting cue to help visual encoder in the CLIP (Contrastive Language-Image Pretraining) by guiding the attention in CLIP visual encoder to relevant regions in the image. This helps the model to focus on highly discriminative regions, without getting distracted from visually similar background features, an essential requirement in a fewshot, finegrained classification setting. We evaluate our method on diverse medical datasets including X-rays, CT scans, and MRI images, and report an accuracy of (71%, 81%, 86%, 58%) from the proposed approach on (COVID, lung-disease, brain-tumor, breast-cancer) datasets against (66%, 70%, 68%, 29%) from a pretrained CLIP model after fewshot training. The proposed approach also allows to obtain interpretable explanation for the classification performance through the localization obtained using segmentation.
Related papers
- Embeddings are all you need! Achieving High Performance Medical Image Classification through Training-Free Embedding Analysis [0.0]
Developing artificial intelligence (AI) and machine learning (ML) models for medical imaging typically involves extensive training and testing on large datasets.
We investigated the feasibility of replacing conventional training procedures with an embedding-based approach.
arXiv Detail & Related papers (2024-12-12T16:59:37Z) - Visual Prompt Engineering for Vision Language Models in Radiology [0.17183214167143138]
Contrastive Language-Image Pretraining (CLIPP) offers a solution by enabling zero-shot classification through large-scale pretraining.
Visual markers improve AUROC2013$ by up to 0.185, highlighting their effectiveness enhancing classification performance.
We release our code and preprocessing pipeline, providing a reference point for future work on localized classification in medical imaging.
arXiv Detail & Related papers (2024-08-28T13:53:27Z) - Explanations of Classifiers Enhance Medical Image Segmentation via
End-to-end Pre-training [37.11542605885003]
Medical image segmentation aims to identify and locate abnormal structures in medical images, such as chest radiographs, using deep neural networks.
Our work collects explanations from well-trained classifiers to generate pseudo labels of segmentation tasks.
We then use Integrated Gradients (IG) method to distill and boost the explanations obtained from the classifiers, generating massive diagnosis-oriented localization labels (DoLL)
These DoLL-annotated images are used for pre-training the model before fine-tuning it for downstream segmentation tasks, including COVID-19 infectious areas, lungs, heart, and clavicles.
arXiv Detail & Related papers (2024-01-16T16:18:42Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Forward-Forward Contrastive Learning [4.465144120325802]
We propose Forward Forward Contrastive Learning (FFCL) as a novel pretraining approach for medical image classification.
FFCL achieves superior performance (3.69% accuracy over ImageNet pretrained ResNet-18) over existing pretraining models in the pneumonia classification task.
arXiv Detail & Related papers (2023-05-04T15:29:06Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - PCA: Semi-supervised Segmentation with Patch Confidence Adversarial
Training [52.895952593202054]
We propose a new semi-supervised adversarial method called Patch Confidence Adrial Training (PCA) for medical image segmentation.
PCA learns the pixel structure and context information in each patch to get enough gradient feedback, which aids the discriminator in convergent to an optimal state.
Our method outperforms the state-of-the-art semi-supervised methods, which demonstrates its effectiveness for medical image segmentation.
arXiv Detail & Related papers (2022-07-24T07:45:47Z) - Self-Supervised-RCNN for Medical Image Segmentation with Limited Data
Annotation [0.16490701092527607]
We propose an alternative deep learning training strategy based on self-supervised pretraining on unlabeled MRI scans.
Our pretraining approach first, randomly applies different distortions to random areas of unlabeled images and then predicts the type of distortions and loss of information.
The effectiveness of the proposed method for segmentation tasks in different pre-training and fine-tuning scenarios is evaluated.
arXiv Detail & Related papers (2022-07-17T13:28:52Z) - Intelligent Masking: Deep Q-Learning for Context Encoding in Medical
Image Analysis [48.02011627390706]
We develop a novel self-supervised approach that occludes targeted regions to improve the pre-training procedure.
We show that training the agent against the prediction model can significantly improve the semantic features extracted for downstream classification tasks.
arXiv Detail & Related papers (2022-03-25T19:05:06Z) - Improving Classification Model Performance on Chest X-Rays through Lung
Segmentation [63.45024974079371]
We propose a deep learning approach to enhance abnormal chest x-ray (CXR) identification performance through segmentations.
Our approach is designed in a cascaded manner and incorporates two modules: a deep neural network with criss-cross attention modules (XLSor) for localizing lung region in CXR images and a CXR classification model with a backbone of a self-supervised momentum contrast (MoCo) model pre-trained on large-scale CXR data sets.
arXiv Detail & Related papers (2022-02-22T15:24:06Z) - A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained
Vision-language Model [61.58071099082296]
It is unclear how to make zero-shot recognition working well on broader vision problems, such as object detection and semantic segmentation.
In this paper, we target for zero-shot semantic segmentation, by building it on an off-the-shelf pre-trained vision-language model, i.e., CLIP.
Our experimental results show that this simple framework surpasses previous state-of-the-arts by a large margin.
arXiv Detail & Related papers (2021-12-29T18:56:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.