Related papers: Generalizable Knowledge Distillation from Vision Foundation Models for Semantic Segmentation

Generalizable Knowledge Distillation from Vision Foundation Models for Semantic Segmentation

URL: http://arxiv.org/abs/2603.02554v1
Date: Tue, 03 Mar 2026 03:18:12 GMT
Title: Generalizable Knowledge Distillation from Vision Foundation Models for Semantic Segmentation
Authors: Chonghua Lv, Dong Zhao, Shuang Wang, Dou Quan, Ning Huyan, Nicu Sebe, Zhun Zhong,
Abstract summary: Generalizable Knowledge Distillation (GKD) is a multi-stage framework that explicitly enhances generalization.<n>Experiments on five domain generalization benchmarks demonstrate that GKD consistently outperforms existing KD methods.
Score: 73.32435804067883
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Knowledge distillation (KD) has been widely applied in semantic segmentation to compress large models, but conventional approaches primarily preserve in-domain accuracy while neglecting out-of-domain generalization, which is essential under distribution shifts. This limitation becomes more severe with the emergence of vision foundation models (VFMs): although VFMs exhibit strong robustness on unseen data, distilling them with conventional KD often compromises this ability. We propose Generalizable Knowledge Distillation (GKD), a multi-stage framework that explicitly enhances generalization. GKD decouples representation learning from task learning. In the first stage, the student acquires domain-agnostic representations through selective feature distillation, and in the second stage, these representations are frozen for task adaptation, thereby mitigating overfitting to visible domains. To further support transfer, we introduce a query-based soft distillation mechanism, where student features act as queries to teacher representations to selectively retrieve transferable spatial knowledge from VFMs. Extensive experiments on five domain generalization benchmarks demonstrate that GKD consistently outperforms existing KD methods, achieving average gains of +1.9% in foundation-to-foundation (F2F) and +10.6% in foundation-to-local (F2L) distillation. The code will be available at https://github.com/Younger-hua/GKD.

Related papers

Rethinking Decoupled Knowledge Distillation: A Predictive Distribution Perspective [9.10299144143817]
Decoupled Knowledge Distillation (DKD) re-emphasizes the importance of logit knowledge through advanced decoupling and strategies.<n>We introduce an enhanced version, the Generalized Decoupled Knowledge Distillation (GDKD) loss.<n>We demonstrate GDKD's superior performance over both the original DKD and other leading knowledge distillation methods.
arXiv Detail & Related papers (2025-12-04T09:56:25Z)
Improving Multimodal Distillation for 3D Semantic Segmentation under Domain Shift [62.50795372173394]
We conduct an exhaustive study to identify recipes for exploiting vision foundation models (VFMs) in unsupervised domain adaptation for semantic segmentation of lidar point clouds.<n>The resulting pipeline achieves state-of-the-art results in four widely-recognized and challenging settings.
arXiv Detail & Related papers (2025-11-21T17:57:43Z)
UHKD: A Unified Framework for Heterogeneous Knowledge Distillation via Frequency-Domain Representations [5.382357091398666]
Unified Heterogeneous Knowledge Distillation (UHKD) is proposed as a framework that leverages intermediate features in the frequency domain for cross-architecture transfer.<n>Experiments on CIFAR-100 and ImageNet-1K demonstrate gains of 5.59% and 0.83% over the latest method.
arXiv Detail & Related papers (2025-10-28T06:41:43Z)
Augmenting Moment Retrieval: Zero-Dependency Two-Stage Learning [33.16156949633519]
We propose a zero-external-dependency Augmented Moment Retrieval framework, AMR, to overcome local optima.<n>AMR resolves ambiguous boundary information and semantic confusion in existing annotations without additional data.<n>AMR achieves improved performance over prior state-of-the-art approaches.
arXiv Detail & Related papers (2025-10-22T14:19:38Z)
Seeing Further on the Shoulders of Giants: Knowledge Inheritance for Vision Foundation Models [54.517276878748305]
Vision foundation models (VFMs) are predominantly developed using data-centric methods.<n>Many open-source vision models have been pretrained on domain-specific data.<n>We present a new model-driven approach for training VFMs through joint knowledge transfer and preservation.
arXiv Detail & Related papers (2025-08-20T13:30:23Z)
RS-MTDF: Multi-Teacher Distillation and Fusion for Remote Sensing Semi-Supervised Semantic Segmentation [43.991262005295596]
We introduce RS-MTDF (Multi-Teacher Distillation and Fusion), a novel framework to guide semi-supervised learning in remote sensing.<n> RS-MTDF employs multiple frozen Vision Foundation Models (VFMs) as expert teachers, utilizing feature-level distillation to align student features with their robust representations.<n>Our method outperforms existing approaches across various label ratios on LoveDA and secures the highest IoU in the majority of semantic categories.
arXiv Detail & Related papers (2025-06-10T13:15:15Z)
DSAGL: Dual-Stream Attention-Guided Learning for Weakly Supervised Whole Slide Image Classification [5.260725801393189]
Whole-slide images (WSIs) are critical for cancer diagnosis due to their ultra-high resolution and rich semantic content.<n>We propose DSAGL (Dual-Stream Attention-Guided Learning), a novel weakly supervised classification framework that combines a teacher-student architecture with a dual-stream design.
arXiv Detail & Related papers (2025-05-29T11:07:16Z)
Fuse Before Transfer: Knowledge Fusion for Heterogeneous Distillation [52.0297393822012]
We introduce an assistant model as a bridge to facilitate smooth feature knowledge transfer between heterogeneous teachers and students.<n>Within our proposed design principle, the assistant model combines the advantages of cross-architecture inductive biases and module functions.<n>Our proposed method is evaluated across some homogeneous model pairs and arbitrary heterogeneous combinations of CNNs, ViTs, spatial KDs.
arXiv Detail & Related papers (2024-10-16T08:02:49Z)
FIXED: Frustratingly Easy Domain Generalization with Mixup [53.782029033068675]
Domain generalization (DG) aims to learn a generalizable model from multiple training domains such that it can perform well on unseen target domains. A popular strategy is to augment training data to benefit generalization through methods such as Mixupcitezhang 2018mixup. We propose a simple yet effective enhancement for Mixup-based DG, namely domain-invariant Feature mIXup (FIX) Our approach significantly outperforms nine state-of-the-art related methods, beating the best performing baseline by 6.5% on average in terms of test accuracy.
arXiv Detail & Related papers (2022-11-07T09:38:34Z)
Source-Free Open Compound Domain Adaptation in Semantic Segmentation [99.82890571842603]
In SF-OCDA, only the source pre-trained model and the target data are available to learn the target model. We propose the Cross-Patch Style Swap (CPSS) to diversify samples with various patch styles in the feature-level. Our method produces state-of-the-art results on the C-Driving dataset.
arXiv Detail & Related papers (2021-06-07T08:38:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.