Related papers: DINOv2 based Self Supervised Learning For Few Shot Medical Image Segmentation

DINOv2 based Self Supervised Learning For Few Shot Medical Image Segmentation

URL: http://arxiv.org/abs/2403.03273v1
Date: Tue, 5 Mar 2024 19:13:45 GMT
Title: DINOv2 based Self Supervised Learning For Few Shot Medical Image Segmentation
Authors: Lev Ayzenberg, Raja Giryes, Hayit Greenspan
Abstract summary: Few-shot segmentation offers a promising solution by endowing models with the capacity to learn novel classes from limited labeled examples. A leading method for FSS is ALPNet, which compares features between the query image and the few available support segmented images. We present a novel approach to few-shot segmentation that not only enhances performance but also paves the way for more robust and adaptable medical image analysis.
Score: 33.471116581196796
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep learning models have emerged as the cornerstone of medical image segmentation, but their efficacy hinges on the availability of extensive manually labeled datasets and their adaptability to unforeseen categories remains a challenge. Few-shot segmentation (FSS) offers a promising solution by endowing models with the capacity to learn novel classes from limited labeled examples. A leading method for FSS is ALPNet, which compares features between the query image and the few available support segmented images. A key question about using ALPNet is how to design its features. In this work, we delve into the potential of using features from DINOv2, which is a foundational self-supervised learning model in computer vision. Leveraging the strengths of ALPNet and harnessing the feature extraction capabilities of DINOv2, we present a novel approach to few-shot segmentation that not only enhances performance but also paves the way for more robust and adaptable medical image analysis.

Related papers

Domain and Task-Focused Example Selection for Data-Efficient Contrastive Medical Image Segmentation [0.2765106384328772]
We propose a novel self-supervised contrastive learning framework for medical image segmentation, dubbed PolyCL.<n>PolyCL learns and transfers context-aware discriminant features useful for segmentation from an innovative surrogate.<n>We show that PolyCL outperforms fully-supervised and self-supervised baselines in both low-data and cross-domain scenarios.
arXiv Detail & Related papers (2025-05-25T16:11:48Z)
A Vision-Language Foundation Model for Leaf Disease Identification [0.0]
Leaf disease identification plays a pivotal role in smart agriculture.<n>Many existing studies still struggle to integrate image and textual modalities to compensate for each other's limitations.<n>We propose SCOLD, a context-aware vision-language foundation model to address these challenges.
arXiv Detail & Related papers (2025-05-11T15:30:06Z)
One-shot In-context Part Segmentation [97.77292483684877]
We present the One-shot In-context Part (OIParts) framework to tackle the challenges of part segmentation. Our framework offers a novel approach to part segmentation that is training-free, flexible, and data-efficient. We have achieved remarkable segmentation performance across diverse object categories.
arXiv Detail & Related papers (2025-03-03T03:50:54Z)
Enhancing Large Vision Language Models with Self-Training on Image Comprehension [131.14381425260706]
We introduce Self-Training on Image (STIC), which emphasizes a self-training approach specifically for image comprehension. First, the model self-constructs a preference for image descriptions using unlabeled images. To further self-improve reasoning on the extracted visual information, we let the model reuse a small portion of existing instruction-tuning data.
arXiv Detail & Related papers (2024-05-30T05:53:49Z)
Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning [12.5354658533836]
Humans possess remarkable ability to accurately classify new, unseen images after being exposed to only a few examples. For artificial neural network models, determining the most relevant features for distinguishing between two images with limited samples presents a challenge. We propose an intra-task mutual attention method for few-shot learning, that involves splitting the support and query samples into patches.
arXiv Detail & Related papers (2024-05-06T02:02:57Z)
Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models [81.71651422951074]
Chain-of-Spot (CoS) method is a novel approach that enhances feature extraction by focusing on key regions of interest. This technique allows LVLMs to access more detailed visual information without altering the original image resolution. Our empirical findings demonstrate a significant improvement in LVLMs' ability to understand and reason about visual content.
arXiv Detail & Related papers (2024-03-19T17:59:52Z)
VIS-MAE: An Efficient Self-supervised Learning Approach on Medical Image Segmentation and Classification [33.699424327366856]
We present VIsualization and Masked AutoEncoder (VIS-MAE), novel model weights specifically designed for medical imaging. VIS-MAE is trained on a dataset of 2.5 million unlabeled images from various modalities. It is then adapted to classification and segmentation tasks using explicit labels.
arXiv Detail & Related papers (2024-02-01T21:45:12Z)
Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks. We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception. Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z)
Self-Supervised Open-Ended Classification with Small Visual Language Models [60.23212389067007]
We present Self-Context Adaptation (SeCAt), a self-supervised approach that unlocks few-shot abilities for open-ended classification with small visual language models. By using models with approximately 1B parameters we outperform the few-shot abilities of much larger models, such as Frozen and FROMAGe.
arXiv Detail & Related papers (2023-09-30T21:41:21Z)
A Dual-branch Self-supervised Representation Learning Framework for Tumour Segmentation in Whole Slide Images [12.961686610789416]
Self-supervised learning (SSL) has emerged as an alternative solution to reduce the annotation overheads in whole slide images. These SSL approaches are not designed for handling multi-resolution WSIs, which limits their performance in learning discriminative image features. We propose a Dual-branch SSL Framework for WSI tumour segmentation (DSF-WSI) that can effectively learn image features from multi-resolution WSIs.
arXiv Detail & Related papers (2023-03-20T10:57:28Z)
Exemplar Learning for Medical Image Segmentation [38.61378161105941]
We propose an Exemplar Learning-based Synthesis Net (ELSNet) framework for medical image segmentation. ELSNet introduces two new modules for image segmentation: an exemplar-guided synthesis module and a pixel-prototype based contrastive embedding module. We conduct experiments on several organ segmentation datasets and present an in-depth analysis.
arXiv Detail & Related papers (2022-04-03T00:10:06Z)
Budget-aware Few-shot Learning via Graph Convolutional Network [56.41899553037247]
This paper tackles the problem of few-shot learning, which aims to learn new visual concepts from a few examples. A common problem setting in few-shot classification assumes random sampling strategy in acquiring data labels. We introduce a new budget-aware few-shot learning problem that aims to learn novel object categories.
arXiv Detail & Related papers (2022-01-07T02:46:35Z)
Pay Attention with Focus: A Novel Learning Scheme for Classification of Whole Slide Images [8.416553728391309]
We propose a novel two-stage approach to analyze whole slide images (WSIs) First, we extract a set of representative patches (called mosaic) from a WSI. Each patch of a mosaic is encoded to a feature vector using a deep network. In the second stage, a set of encoded patch-level features from a WSI is used to compute the primary diagnosis probability.
arXiv Detail & Related papers (2021-06-11T21:59:02Z)
Whole Slide Images based Cancer Survival Prediction using Attention Guided Deep Multiple Instance Learning Networks [38.39901070720532]
Current image-based survival models that limit to key patches or clusters derived from Whole Slide Images (WSIs) We propose Deep Attention Multiple Instance Survival Learning (DeepAttnMISL) by introducing both siamese MI-FCN and attention-based MIL pooling. We evaluated our methods on two large cancer whole slide images datasets and our results suggest that the proposed approach is more effective and suitable for large datasets.
arXiv Detail & Related papers (2020-09-23T14:31:15Z)
Pairwise Relation Learning for Semi-supervised Gland Segmentation [90.45303394358493]
We propose a pairwise relation-based semi-supervised (PRS2) model for gland segmentation on histology images. This model consists of a segmentation network (S-Net) and a pairwise relation network (PR-Net) We evaluate our model against five recent methods on the GlaS dataset and three recent methods on the CRAG dataset.
arXiv Detail & Related papers (2020-08-06T15:02:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.