Related papers: Exploiting DINOv3-Based Self-Supervised Features for Robust Few-Shot Medical Image Segmentation

Exploiting DINOv3-Based Self-Supervised Features for Robust Few-Shot Medical Image Segmentation

URL: http://arxiv.org/abs/2601.08078v1
Date: Mon, 12 Jan 2026 23:44:25 GMT
Title: Exploiting DINOv3-Based Self-Supervised Features for Robust Few-Shot Medical Image Segmentation
Authors: Guoping Xu, Jayaram K. Udupa, Weiguo Lu, You Zhang,
Abstract summary: We propose DINO-AugSeg, a novel framework that leverages DINOv3 features to address the few-shot medical image segmentation challenge.<n>Experiments on six public benchmarks spanning five imaging modalities, including MRI, CT, ultrasound, endoscopy, and dermoscopy, demonstrate that DINO-AugSeg consistently outperforms existing methods.
Score: 3.2564581758935094
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep learning-based automatic medical image segmentation plays a critical role in clinical diagnosis and treatment planning but remains challenging in few-shot scenarios due to the scarcity of annotated training data. Recently, self-supervised foundation models such as DINOv3, which were trained on large natural image datasets, have shown strong potential for dense feature extraction that can help with the few-shot learning challenge. Yet, their direct application to medical images is hindered by domain differences. In this work, we propose DINO-AugSeg, a novel framework that leverages DINOv3 features to address the few-shot medical image segmentation challenge. Specifically, we introduce WT-Aug, a wavelet-based feature-level augmentation module that enriches the diversity of DINOv3-extracted features by perturbing frequency components, and CG-Fuse, a contextual information-guided fusion module that exploits cross-attention to integrate semantic-rich low-resolution features with spatially detailed high-resolution features. Extensive experiments on six public benchmarks spanning five imaging modalities, including MRI, CT, ultrasound, endoscopy, and dermoscopy, demonstrate that DINO-AugSeg consistently outperforms existing methods under limited-sample conditions. The results highlight the effectiveness of incorporating wavelet-domain augmentation and contextual fusion for robust feature representation, suggesting DINO-AugSeg as a promising direction for advancing few-shot medical image segmentation. Code and data will be made available on https://github.com/apple1986/DINO-AugSeg.

Related papers

Does DINOv3 Set a New Medical Vision Standard? [67.33543059306938]
This report investigates whether DINOv3 can serve as a powerful unified encoder for medical vision tasks without domain-specific pre-training.<n>We benchmark DINOv3 across common medical vision tasks, including 2D/3D classification and segmentation.<n>Remarkably, it can even outperform medical-specific foundation models like BiomedCLIP and CT-Net on several tasks.
arXiv Detail & Related papers (2025-09-08T09:28:57Z)
MedDINOv3: How to adapt vision foundation models for medical image segmentation? [16.256590269050367]
We introduce MedDINOv3, a simple and effective framework for adapting DINOv3 to medical segmentation.<n>We perform domain-adaptive pretraining on CT-3M, a curated collection of 3.87M axial CT slices, using a multi-stage DINOv3 recipe.<n>MedDINOv3 matches or exceeds state-of-the-art performance across four segmentation benchmarks.
arXiv Detail & Related papers (2025-09-02T14:44:43Z)
MedBridge: Bridging Foundation Vision-Language Models to Medical Image Diagnosis [10.082738539201804]
Recent vision-language foundation models deliver state-of-the-art results on natural image classification but falter on medical images due to domain shifts.<n>We introduce MedBridge, a lightweight multimodal adaptation framework that re-purposes pretrained VLMs for accurate medical image diagnosis.<n>MedBridge achieved over 6-15% improvement in AUC compared to state-of-the-art VLM adaptation methods in multi-label thoracic disease diagnosis.
arXiv Detail & Related papers (2025-05-27T19:37:51Z)
MRGen: Segmentation Data Engine for Underrepresented MRI Modalities [59.61465292965639]
Training medical image segmentation models for rare yet clinically important imaging modalities is challenging due to the scarcity of annotated data.<n>This paper investigates leveraging generative models to synthesize data, for training segmentation models for underrepresented modalities.<n>We present MRGen, a data engine for controllable medical image synthesis conditioned on text prompts and segmentation masks.
arXiv Detail & Related papers (2024-12-04T16:34:22Z)
Discriminative Hamiltonian Variational Autoencoder for Accurate Tumor Segmentation in Data-Scarce Regimes [2.8498944632323755]
We propose an end-to-end hybrid architecture for medical image segmentation. We use Hamiltonian Variational Autoencoders (HVAE) and a discriminative regularization to improve the quality of generated images. Our architecture operates on a slice-by-slice basis to segment 3D volumes, capitilizing on the richly augmented dataset.
arXiv Detail & Related papers (2024-06-17T15:42:08Z)
Enhancing Weakly Supervised 3D Medical Image Segmentation through Probabilistic-aware Learning [47.700298779672366]
3D medical image segmentation is a challenging task with crucial implications for disease diagnosis and treatment planning.<n>Recent advances in deep learning have significantly enhanced fully supervised medical image segmentation.<n>We propose a novel probabilistic-aware weakly supervised learning pipeline, specifically designed for 3D medical imaging.
arXiv Detail & Related papers (2024-03-05T00:46:53Z)
Disruptive Autoencoders: Leveraging Low-level features for 3D Medical Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images. We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations. The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z)
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z)
MedSegDiff: Medical Image Segmentation with Diffusion Probabilistic Model [8.910108260704964]
Diffusion model (DPM) recently becomes one of the hottest topic in computer vision. We propose the first DPM based model toward general medical image segmentation tasks, which we named MedSegDiff. experimental results show that MedSegDiff outperforms state-of-the-art (SOTA) methods with considerable performance gap.
arXiv Detail & Related papers (2022-11-01T17:24:44Z)
Few-shot Medical Image Segmentation using a Global Correlation Network with Discriminative Embedding [60.89561661441736]
We propose a novel method for few-shot medical image segmentation. We construct our few-shot image segmentor using a deep convolutional network trained episodically. We enhance discriminability of deep embedding to encourage clustering of the feature domains of the same class.
arXiv Detail & Related papers (2020-12-10T04:01:07Z)
Cross-Modal Self-Attention Distillation for Prostate Cancer Segmentation [1.630747108038841]
How to use the multi-modal image features more efficiently is still a challenging problem in the field of medical image segmentation. We develop a cross-modal self-attention distillation network by fully exploiting the encoded information of the intermediate layers from different modalities. We evaluate our model in five-fold cross-validation on 358 MRI with biopsy confirmed.
arXiv Detail & Related papers (2020-11-08T06:19:13Z)
Weakly-supervised Learning For Catheter Segmentation in 3D Frustum Ultrasound [74.22397862400177]
We propose a novel Frustum ultrasound based catheter segmentation method. The proposed method achieved the state-of-the-art performance with an efficiency of 0.25 second per volume.
arXiv Detail & Related papers (2020-10-19T13:56:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.