Direct Distillation between Different Domains
- URL: http://arxiv.org/abs/2401.06826v1
- Date: Fri, 12 Jan 2024 02:48:51 GMT
- Title: Direct Distillation between Different Domains
- Authors: Jialiang Tang, Shuo Chen, Gang Niu, Hongyuan Zhu, Joey Tianyi Zhou,
Chen Gong, Masashi Sugiyama
- Abstract summary: We propose a new one-stage method dubbed Direct Distillation between Different Domains" (4Ds)
We first design a learnable adapter based on the Fourier transform to separate the domain-invariant knowledge from the domain-specific knowledge.
We then build a fusion-activation mechanism to transfer the valuable domain-invariant knowledge to the student network.
- Score: 97.39470334253163
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge Distillation (KD) aims to learn a compact student network using
knowledge from a large pre-trained teacher network, where both networks are
trained on data from the same distribution. However, in practical applications,
the student network may be required to perform in a new scenario (i.e., the
target domain), which usually exhibits significant differences from the known
scenario of the teacher network (i.e., the source domain). The traditional
domain adaptation techniques can be integrated with KD in a two-stage process
to bridge the domain gap, but the ultimate reliability of two-stage approaches
tends to be limited due to the high computational consumption and the
additional errors accumulated from both stages. To solve this problem, we
propose a new one-stage method dubbed ``Direct Distillation between Different
Domains" (4Ds). We first design a learnable adapter based on the Fourier
transform to separate the domain-invariant knowledge from the domain-specific
knowledge. Then, we build a fusion-activation mechanism to transfer the
valuable domain-invariant knowledge to the student network, while
simultaneously encouraging the adapter within the teacher network to learn the
domain-specific knowledge of the target data. As a result, the teacher network
can effectively transfer categorical knowledge that aligns with the target
domain of the student network. Intensive experiments on various benchmark
datasets demonstrate that our proposed 4Ds method successfully produces
reliable student networks and outperforms state-of-the-art approaches.
Related papers
- D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection [15.071470389431672]
Domain adaptation for object detection typically entails transferring knowledge from one visible domain to another visible domain.
We propose a Distinctive Dual-Domain Teacher (D3T) framework that employs distinct training paradigms for each domain.
We validate our method through newly designed experimental protocols with well-known thermal datasets.
arXiv Detail & Related papers (2024-03-14T13:05:43Z) - Unsupervised Domain Adaptation on Person Re-Identification via
Dual-level Asymmetric Mutual Learning [108.86940401125649]
This paper proposes a Dual-level Asymmetric Mutual Learning method (DAML) to learn discriminative representations from a broader knowledge scope with diverse embedding spaces.
The knowledge transfer between two networks is based on an asymmetric mutual learning manner.
Experiments in Market-1501, CUHK-SYSU, and MSMT17 public datasets verified the superiority of DAML over state-of-the-arts.
arXiv Detail & Related papers (2023-01-29T12:36:17Z) - Self-Supervised Graph Neural Network for Multi-Source Domain Adaptation [51.21190751266442]
Domain adaptation (DA) tries to tackle the scenarios when the test data does not fully follow the same distribution of the training data.
By learning from large-scale unlabeled samples, self-supervised learning has now become a new trend in deep learning.
We propose a novel textbfSelf-textbfSupervised textbfGraph Neural Network (SSG) to enable more effective inter-task information exchange and knowledge sharing.
arXiv Detail & Related papers (2022-04-08T03:37:56Z) - Robust Ensembling Network for Unsupervised Domain Adaptation [20.152004296679138]
We propose a Robust Ensembling Network (REN) for unsupervised domain adaptation (UDA)
REN mainly includes a teacher network and a student network, which performs standard domain adaptation training and updates weights of the teacher network.
For the purpose of improving the basic ability of the student network, we utilize the consistency constraint to balance the error between the student network and the teacher network.
arXiv Detail & Related papers (2021-08-21T09:19:13Z) - Network-Agnostic Knowledge Transfer for Medical Image Segmentation [2.25146058725705]
We propose a knowledge transfer approach from a teacher to a student network wherein we train the student on an independent transferal dataset.
We studied knowledge transfer from a single teacher, combination of knowledge transfer and fine-tuning, and knowledge transfer from multiple teachers.
The proposed algorithm is effective for knowledge transfer and easily tunable.
arXiv Detail & Related papers (2021-01-23T19:06:14Z) - Dual-Teacher++: Exploiting Intra-domain and Inter-domain Knowledge with
Reliable Transfer for Cardiac Segmentation [69.09432302497116]
We propose a cutting-edge semi-supervised domain adaptation framework, namely Dual-Teacher++.
We design novel dual teacher models, including an inter-domain teacher model to explore cross-modality priors from source domain (e.g., MR) and an intra-domain teacher model to investigate the knowledge beneath unlabeled target domain.
In this way, the student model can obtain reliable dual-domain knowledge and yield improved performance on target domain data.
arXiv Detail & Related papers (2021-01-07T05:17:38Z) - Wasserstein Contrastive Representation Distillation [114.24609306495456]
We propose Wasserstein Contrastive Representation Distillation (WCoRD), which leverages both primal and dual forms of Wasserstein distance for knowledge distillation.
The dual form is used for global knowledge transfer, yielding a contrastive learning objective that maximizes the lower bound of mutual information between the teacher and the student networks.
Experiments demonstrate that the proposed WCoRD method outperforms state-of-the-art approaches on privileged information distillation, model compression and cross-modal transfer.
arXiv Detail & Related papers (2020-12-15T23:43:28Z) - Domain Adaption for Knowledge Tracing [65.86619804954283]
We propose a novel adaptable framework, namely knowledge tracing (AKT) to address the DAKT problem.
For the first aspect, we incorporate the educational characteristics (e.g., slip, guess, question texts) based on the deep knowledge tracing (DKT) to obtain a good performed knowledge tracing model.
For the second aspect, we propose and adopt three domain adaptation processes. First, we pre-train an auto-encoder to select useful source instances for target model training.
arXiv Detail & Related papers (2020-01-14T15:04:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.