Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data
- URL: http://arxiv.org/abs/2110.15094v1
- Date: Wed, 27 Oct 2021 13:01:10 GMT
- Title: Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data
- Authors: Gongfan Fang, Yifan Bao, Jie Song, Xinchao Wang, Donglin Xie,
Chengchao Shen, Mingli Song
- Abstract summary: Knowledge distillation(KD) aims to craft a compact student model that imitates the behavior of a pre-trained teacher in a target domain.
We introduce a handy yet surprisingly efficacious approach, dubbed astextitMosaicKD.
In MosaicKD, this is achieved through a four-player min-max game, in which a generator, a discriminator, a student network, are collectively trained in an adversarial manner.
- Score: 56.29595334715237
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge distillation~(KD) aims to craft a compact student model that
imitates the behavior of a pre-trained teacher in a target domain. Prior KD
approaches, despite their gratifying results, have largely relied on the
premise that \emph{in-domain} data is available to carry out the knowledge
transfer. Such an assumption, unfortunately, in many cases violates the
practical setting, since the original training data or even the data domain is
often unreachable due to privacy or copyright reasons. In this paper, we
attempt to tackle an ambitious task, termed as \emph{out-of-domain} knowledge
distillation~(OOD-KD), which allows us to conduct KD using only OOD data that
can be readily obtained at a very low cost. Admittedly, OOD-KD is by nature a
highly challenging task due to the agnostic domain gap. To this end, we
introduce a handy yet surprisingly efficacious approach, dubbed
as~\textit{MosaicKD}. The key insight behind MosaicKD lies in that, samples
from various domains share common local patterns, even though their global
semantic may vary significantly; these shared local patterns, in turn, can be
re-assembled analogous to mosaic tiling, to approximate the in-domain data and
to further alleviating the domain discrepancy. In MosaicKD, this is achieved
through a four-player min-max game, in which a generator, a discriminator, a
student network, are collectively trained in an adversarial manner, partially
under the guidance of a pre-trained teacher. We validate MosaicKD over
{classification and semantic segmentation tasks} across various benchmarks, and
demonstrate that it yields results much superior to the state-of-the-art
counterparts on OOD data. Our code is available at
\url{https://github.com/zju-vipa/MosaicKD}.
Related papers
- DisCoM-KD: Cross-Modal Knowledge Distillation via Disentanglement Representation and Adversarial Learning [3.763772992906958]
Cross-modal knowledge distillation (CMKD) refers to the scenario in which a learning framework must handle training and test data that exhibit a modality mismatch.
DisCoM-KD (Disentanglement-learning based Cross-Modal Knowledge Distillation) explicitly models different types of per-modality information.
arXiv Detail & Related papers (2024-08-05T13:44:15Z) - AuG-KD: Anchor-Based Mixup Generation for Out-of-Domain Knowledge Distillation [33.208860361882095]
Data-Free Knowledge Distillation (DFKD) methods have emerged as direct solutions.
However, simply adopting models derived from DFKD for real-world applications suffers significant performance degradation.
We propose a simple but effective method AuG-KD to selectively transfer teachers' appropriate knowledge.
arXiv Detail & Related papers (2024-03-11T03:34:14Z) - EAT: Towards Long-Tailed Out-of-Distribution Detection [55.380390767978554]
This paper addresses the challenging task of long-tailed OOD detection.
The main difficulty lies in distinguishing OOD data from samples belonging to the tail classes.
We propose two simple ideas: (1) Expanding the in-distribution class space by introducing multiple abstention classes, and (2) Augmenting the context-limited tail classes by overlaying images onto the context-rich OOD data.
arXiv Detail & Related papers (2023-12-14T13:47:13Z) - Prior Knowledge Guided Unsupervised Domain Adaptation [82.9977759320565]
We propose a Knowledge-guided Unsupervised Domain Adaptation (KUDA) setting where prior knowledge about the target class distribution is available.
In particular, we consider two specific types of prior knowledge about the class distribution in the target domain: Unary Bound and Binary Relationship.
We propose a rectification module that uses such prior knowledge to refine model generated pseudo labels.
arXiv Detail & Related papers (2022-07-18T18:41:36Z) - HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain
Language Model Compression [53.90578309960526]
Large pre-trained language models (PLMs) have shown overwhelming performances compared with traditional neural network methods.
We propose a hierarchical relational knowledge distillation (HRKD) method to capture both hierarchical and domain relational information.
arXiv Detail & Related papers (2021-10-16T11:23:02Z) - Dual-Teacher++: Exploiting Intra-domain and Inter-domain Knowledge with
Reliable Transfer for Cardiac Segmentation [69.09432302497116]
We propose a cutting-edge semi-supervised domain adaptation framework, namely Dual-Teacher++.
We design novel dual teacher models, including an inter-domain teacher model to explore cross-modality priors from source domain (e.g., MR) and an intra-domain teacher model to investigate the knowledge beneath unlabeled target domain.
In this way, the student model can obtain reliable dual-domain knowledge and yield improved performance on target domain data.
arXiv Detail & Related papers (2021-01-07T05:17:38Z) - Meta-KD: A Meta Knowledge Distillation Framework for Language Model
Compression across Domains [31.66937407833244]
We propose a Meta-Knowledge Distillation (Meta-KD) framework to build a meta-teacher model that captures transferable knowledge across domains.
Specifically, we first leverage a cross-domain learning process to train the meta-teacher on multiple domains, and then propose a meta-distillation algorithm to learn single-domain student models with guidance from the meta-teacher.
arXiv Detail & Related papers (2020-12-02T15:18:37Z) - Unsupervised Multi-Target Domain Adaptation Through Knowledge
Distillation [14.088776449829345]
Unsupervised domain adaptation (UDA) seeks to alleviate the problem of domain shift between the distribution of unlabeled data.
In this paper, we propose a novel unsupervised MTDA approach to train a CNN that can generalize well across multiple target domains.
arXiv Detail & Related papers (2020-07-14T14:59:45Z) - Inter-Region Affinity Distillation for Road Marking Segmentation [81.3619453527367]
We study the problem of distilling knowledge from a large deep teacher network to a much smaller student network.
Our method is known as Inter-Region Affinity KD (IntRA-KD)
arXiv Detail & Related papers (2020-04-11T04:26:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.