DINO-Mix: Distilling Foundational Knowledge with Cross-Domain CutMix for Semi-supervised Class-imbalanced Medical Image Segmentation
- URL: http://arxiv.org/abs/2602.07819v1
- Date: Sun, 08 Feb 2026 04:57:39 GMT
- Title: DINO-Mix: Distilling Foundational Knowledge with Cross-Domain CutMix for Semi-supervised Class-imbalanced Medical Image Segmentation
- Authors: Xinyu Liu, Guolei Sun,
- Abstract summary: Semi-supervised learning (SSL) has emerged as a critical paradigm for medical image segmentation.<n> prevailing SSL frameworks are fundamentally "inward-looking"<n>We propose a paradigm shift to a multi-level "outward-looking" framework.
- Score: 14.732550189753697
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semi-supervised learning (SSL) has emerged as a critical paradigm for medical image segmentation, mitigating the immense cost of dense annotations. However, prevailing SSL frameworks are fundamentally "inward-looking", recycling information and biases solely from within the target dataset. This design triggers a vicious cycle of confirmation bias under class imbalance, leading to the catastrophic failure to recognize minority classes. To dismantle this systemic issue, we propose a paradigm shift to a multi-level "outward-looking" framework. Our primary innovation is Foundational Knowledge Distillation (FKD), which looks outward beyond the confines of medical imaging by introducing a pre-trained visual foundation model, DINOv3, as an unbiased external semantic teacher. Instead of trusting the student's biased high confidence, our method distills knowledge from DINOv3's robust understanding of high semantic uniqueness, providing a stable, cross-domain supervisory signal that anchors the learning of minority classes. To complement this core strategy, we further look outward within the data by proposing Progressive Imbalance-aware CutMix (PIC), which creates a dynamic curriculum that adaptively forces the model to focus on minority classes in both labeled and unlabeled subsets. This layered strategy forms our framework, DINO-Mix, which breaks the vicious cycle of bias and achieves remarkable performance on challenging semi-supervised class-imbalanced medical image segmentation benchmarks Synapse and AMOS.
Related papers
- Uncertainty-Aware Concept and Motion Segmentation for Semi-Supervised Angiography Videos [15.975499220724044]
We propose a SAM3-based Teacher-student framework with Motion-Aware consistency and Progressive Confidence Regularization.<n>Our method utilizes SAM3's unique promptable concept segmentation design and innovates a SAM3-based teacher-student framework to maximize the performance potential of both the teacher and the student.
arXiv Detail & Related papers (2026-03-01T03:04:43Z) - HDC: Hierarchical Distillation for Multi-level Noisy Consistency in Semi-Supervised Fetal Ultrasound Segmentation [2.964206587462833]
A novel semi-supervised segmentation framework, called HDC, is proposed incorporating adaptive consistency learning with a single-teacher architecture.<n>The framework introduces a hierarchical distillation mechanism with two objectives: Correlation Guidance Loss for aligning feature representations and Mutual Information Loss for stabilizing noisy student learning.
arXiv Detail & Related papers (2025-04-14T04:52:24Z) - Boosting Semi-Supervised Medical Image Segmentation via Masked Image Consistency and Discrepancy Learning [2.5355185243767986]
We propose the Masked Image Consistency and Discrepancy Learning (MICD) framework with three key modules.<n>The Cross Feature Consistency (CFC) module fortifies information exchange and model robustness.<n>The Cross Model Discrepancy (CMD) module utilizes EMA teacher networks to oversee outputs and preserve branch diversity.
arXiv Detail & Related papers (2025-03-18T08:20:35Z) - You Are Your Own Best Teacher: Achieving Centralized-level Performance in Federated Learning under Heterogeneous and Long-tailed Data [54.56492110703343]
Data heterogeneity, stemming from local non-IID data and global long-tailed distributions, is a major challenge in federated learning (FL)<n>We propose FedYoYo to improve representation learning by distilling knowledge between weakly and strongly augmented local samples.<n>We show FedYoYo achieves state-of-the-art results, even surpassing centralized logit adjustment methods by 5.4% under global long-tailed settings.
arXiv Detail & Related papers (2025-03-10T04:57:20Z) - Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites:
A Federated Learning Approach with Noise-Resilient Training [75.40980802817349]
Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area.
We introduce a Decoupled Hard Label Correction (DHLC) strategy that considers the imbalanced distribution and fuzzy boundaries of MS lesions.
We also introduce a Centrally Enhanced Label Correction (CELC) strategy, which leverages the aggregated central model as a correction teacher for all sites.
arXiv Detail & Related papers (2023-08-31T00:36:10Z) - DHC: Dual-debiased Heterogeneous Co-training Framework for
Class-imbalanced Semi-supervised Medical Image Segmentation [19.033066343869862]
We present a novel Dual-debiased Heterogeneous Co-training (DHC) framework for semi-supervised 3D medical image segmentation.
Specifically, we propose two loss weighting strategies, namely Distribution-aware Debiased Weighting (DistDW) and Difficulty-aware Debiased Weighting (DiffDW)
Our proposed framework brings significant improvements by using pseudo labels for debiasing and alleviating the class imbalance problem.
arXiv Detail & Related papers (2023-07-22T02:16:05Z) - Rethinking Semi-Supervised Medical Image Segmentation: A
Variance-Reduction Perspective [51.70661197256033]
We propose ARCO, a semi-supervised contrastive learning framework with stratified group theory for medical image segmentation.
We first propose building ARCO through the concept of variance-reduced estimation and show that certain variance-reduction techniques are particularly beneficial in pixel/voxel-level segmentation tasks.
We experimentally validate our approaches on eight benchmarks, i.e., five 2D/3D medical and three semantic segmentation datasets, with different label settings.
arXiv Detail & Related papers (2023-02-03T13:50:25Z) - GraVIS: Grouping Augmented Views from Independent Sources for
Dermatology Analysis [52.04899592688968]
We propose GraVIS, which is specifically optimized for learning self-supervised features from dermatology images.
GraVIS significantly outperforms its transfer learning and self-supervised learning counterparts in both lesion segmentation and disease classification tasks.
arXiv Detail & Related papers (2023-01-11T11:38:37Z) - Calibrating Label Distribution for Class-Imbalanced Barely-Supervised
Knee Segmentation [11.21648118505577]
Semi-supervised learning (SSL) is highly desirable for training with insufficient labeled data.
We present a novel framework for barely-supervised knee segmentation with noisy and imbalanced labels.
Our method outperforms the state-of-the-art SSL methods.
arXiv Detail & Related papers (2022-05-07T12:53:06Z) - Novel Class Discovery in Semantic Segmentation [104.30729847367104]
We introduce a new setting of Novel Class Discovery in Semantic (NCDSS)
It aims at segmenting unlabeled images containing new classes given prior knowledge from a labeled set of disjoint classes.
In NCDSS, we need to distinguish the objects and background, and to handle the existence of multiple classes within an image.
We propose the Entropy-based Uncertainty Modeling and Self-training (EUMS) framework to overcome noisy pseudo-labels.
arXiv Detail & Related papers (2021-12-03T13:31:59Z) - One-shot Weakly-Supervised Segmentation in Medical Images [12.184590794655517]
We present an innovative framework for 3D medical image segmentation with one-shot and weakly-supervised settings.
A propagation-reconstruction network is proposed to project scribbles from annotated volume to unlabeled 3D images.
A dual-level feature denoising module is designed to refine the scribbles based on anatomical- and pixel-level features.
arXiv Detail & Related papers (2021-11-21T09:14:13Z) - Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text.
These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining.
We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.