DINOv2 based Self Supervised Learning For Few Shot Medical Image
Segmentation
- URL: http://arxiv.org/abs/2403.03273v1
- Date: Tue, 5 Mar 2024 19:13:45 GMT
- Title: DINOv2 based Self Supervised Learning For Few Shot Medical Image
Segmentation
- Authors: Lev Ayzenberg, Raja Giryes, Hayit Greenspan
- Abstract summary: Few-shot segmentation offers a promising solution by endowing models with the capacity to learn novel classes from limited labeled examples.
A leading method for FSS is ALPNet, which compares features between the query image and the few available support segmented images.
We present a novel approach to few-shot segmentation that not only enhances performance but also paves the way for more robust and adaptable medical image analysis.
- Score: 33.471116581196796
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning models have emerged as the cornerstone of medical image
segmentation, but their efficacy hinges on the availability of extensive
manually labeled datasets and their adaptability to unforeseen categories
remains a challenge. Few-shot segmentation (FSS) offers a promising solution by
endowing models with the capacity to learn novel classes from limited labeled
examples. A leading method for FSS is ALPNet, which compares features between
the query image and the few available support segmented images. A key question
about using ALPNet is how to design its features. In this work, we delve into
the potential of using features from DINOv2, which is a foundational
self-supervised learning model in computer vision. Leveraging the strengths of
ALPNet and harnessing the feature extraction capabilities of DINOv2, we present
a novel approach to few-shot segmentation that not only enhances performance
but also paves the way for more robust and adaptable medical image analysis.
Related papers
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension [131.14381425260706]
We introduce Self-Training on Image (STIC), which emphasizes a self-training approach specifically for image comprehension.
First, the model self-constructs a preference for image descriptions using unlabeled images.
To further self-improve reasoning on the extracted visual information, we let the model reuse a small portion of existing instruction-tuning data.
arXiv Detail & Related papers (2024-05-30T05:53:49Z) - Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning [12.5354658533836]
Humans possess remarkable ability to accurately classify new, unseen images after being exposed to only a few examples.
For artificial neural network models, determining the most relevant features for distinguishing between two images with limited samples presents a challenge.
We propose an intra-task mutual attention method for few-shot learning, that involves splitting the support and query samples into patches.
arXiv Detail & Related papers (2024-05-06T02:02:57Z) - Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models [81.71651422951074]
Chain-of-Spot (CoS) method is a novel approach that enhances feature extraction by focusing on key regions of interest.
This technique allows LVLMs to access more detailed visual information without altering the original image resolution.
Our empirical findings demonstrate a significant improvement in LVLMs' ability to understand and reason about visual content.
arXiv Detail & Related papers (2024-03-19T17:59:52Z) - Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks.
We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception.
Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z) - Self-Supervised Open-Ended Classification with Small Visual Language
Models [60.23212389067007]
We present Self-Context Adaptation (SeCAt), a self-supervised approach that unlocks few-shot abilities for open-ended classification with small visual language models.
By using models with approximately 1B parameters we outperform the few-shot abilities of much larger models, such as Frozen and FROMAGe.
arXiv Detail & Related papers (2023-09-30T21:41:21Z) - A Dual-branch Self-supervised Representation Learning Framework for
Tumour Segmentation in Whole Slide Images [12.961686610789416]
Self-supervised learning (SSL) has emerged as an alternative solution to reduce the annotation overheads in whole slide images.
These SSL approaches are not designed for handling multi-resolution WSIs, which limits their performance in learning discriminative image features.
We propose a Dual-branch SSL Framework for WSI tumour segmentation (DSF-WSI) that can effectively learn image features from multi-resolution WSIs.
arXiv Detail & Related papers (2023-03-20T10:57:28Z) - Exemplar Learning for Medical Image Segmentation [38.61378161105941]
We propose an Exemplar Learning-based Synthesis Net (ELSNet) framework for medical image segmentation.
ELSNet introduces two new modules for image segmentation: an exemplar-guided synthesis module and a pixel-prototype based contrastive embedding module.
We conduct experiments on several organ segmentation datasets and present an in-depth analysis.
arXiv Detail & Related papers (2022-04-03T00:10:06Z) - Budget-aware Few-shot Learning via Graph Convolutional Network [56.41899553037247]
This paper tackles the problem of few-shot learning, which aims to learn new visual concepts from a few examples.
A common problem setting in few-shot classification assumes random sampling strategy in acquiring data labels.
We introduce a new budget-aware few-shot learning problem that aims to learn novel object categories.
arXiv Detail & Related papers (2022-01-07T02:46:35Z) - Pay Attention with Focus: A Novel Learning Scheme for Classification of
Whole Slide Images [8.416553728391309]
We propose a novel two-stage approach to analyze whole slide images (WSIs)
First, we extract a set of representative patches (called mosaic) from a WSI.
Each patch of a mosaic is encoded to a feature vector using a deep network.
In the second stage, a set of encoded patch-level features from a WSI is used to compute the primary diagnosis probability.
arXiv Detail & Related papers (2021-06-11T21:59:02Z) - Whole Slide Images based Cancer Survival Prediction using Attention
Guided Deep Multiple Instance Learning Networks [38.39901070720532]
Current image-based survival models that limit to key patches or clusters derived from Whole Slide Images (WSIs)
We propose Deep Attention Multiple Instance Survival Learning (DeepAttnMISL) by introducing both siamese MI-FCN and attention-based MIL pooling.
We evaluated our methods on two large cancer whole slide images datasets and our results suggest that the proposed approach is more effective and suitable for large datasets.
arXiv Detail & Related papers (2020-09-23T14:31:15Z) - Pairwise Relation Learning for Semi-supervised Gland Segmentation [90.45303394358493]
We propose a pairwise relation-based semi-supervised (PRS2) model for gland segmentation on histology images.
This model consists of a segmentation network (S-Net) and a pairwise relation network (PR-Net)
We evaluate our model against five recent methods on the GlaS dataset and three recent methods on the CRAG dataset.
arXiv Detail & Related papers (2020-08-06T15:02:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.