Representation Transfer by Optimal Transport
- URL: http://arxiv.org/abs/2007.06737v2
- Date: Fri, 26 Feb 2021 06:34:09 GMT
- Title: Representation Transfer by Optimal Transport
- Authors: Xuhong Li, Yves Grandvalet, R\'emi Flamary, Nicolas Courty, Dejing Dou
- Abstract summary: We use optimal transport to quantify the match between two representations.
This distance defines a regularizer promoting the similarity of the student's representation with that of the teacher.
- Score: 34.77292648424614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning generic representations with deep networks requires massive training
samples and significant computer resources. To learn a new specific task, an
important issue is to transfer the generic teacher's representation to a
student network. In this paper, we propose to use a metric between
representations that is based on a functional view of neurons. We use optimal
transport to quantify the match between two representations, yielding a
distance that embeds some invariances inherent to the representation of deep
networks. This distance defines a regularizer promoting the similarity of the
student's representation with that of the teacher. Our approach can be used in
any learning context where representation transfer is applicable. We experiment
here on two standard settings: inductive transfer learning, where the teacher's
representation is transferred to a student network of same architecture for a
new related task, and knowledge distillation, where the teacher's
representation is transferred to a student of simpler architecture for the same
task (model compression). Our approach also lends itself to solving new
learning problems; we demonstrate this by showing how to directly transfer the
teacher's representation to a simpler architecture student for a new related
task.
Related papers
- How a student becomes a teacher: learning and forgetting through
Spectral methods [1.1470070927586018]
In theoretical ML, the teacher paradigm is often employed as an effective metaphor for real-life tuition.
In this work, we take a leap forward by proposing a radically different optimization scheme.
Working in this framework, we could isolate a stable student substructure, that mirrors the true complexity of the teacher.
arXiv Detail & Related papers (2023-10-19T09:40:30Z) - Improving Ensemble Distillation With Weight Averaging and Diversifying
Perturbation [22.87106703794863]
It motivates distilling knowledge from the ensemble teacher into a smaller student network.
We propose a weight averaging technique where a student with multipleworks is trained to absorb the functional diversity of ensemble teachers.
We also propose a perturbation strategy that seeks inputs from which the diversities of teachers can be better transferred to the student.
arXiv Detail & Related papers (2022-06-30T06:23:03Z) - Investigating the Properties of Neural Network Representations in
Reinforcement Learning [35.02223992335008]
This paper empirically investigates the properties of representations that support transfer in reinforcement learning.
We consider Deep Q-learning agents with different auxiliary losses in a pixel-based navigation environment.
We develop a method to better understand why some representations work better for transfer, through a systematic approach.
arXiv Detail & Related papers (2022-03-30T00:14:26Z) - Representation Consolidation for Training Expert Students [54.90754502493968]
We show that a multi-head, multi-task distillation method is sufficient to consolidate representations from task-specific teacher(s) and improve downstream performance.
Our method can also combine the representational knowledge of multiple teachers trained on one or multiple domains into a single model.
arXiv Detail & Related papers (2021-07-16T17:58:18Z) - Graph Consistency based Mean-Teaching for Unsupervised Domain Adaptive
Person Re-Identification [54.58165777717885]
This paper proposes a Graph Consistency based Mean-Teaching (GCMT) method with constructing the Graph Consistency Constraint (GCC) between teacher and student networks.
Experiments on three datasets, i.e., Market-1501, DukeMTMCreID, and MSMT17, show that proposed GCMT outperforms state-of-the-art methods by clear margin.
arXiv Detail & Related papers (2021-05-11T04:09:49Z) - Knowledge Distillation By Sparse Representation Matching [107.87219371697063]
We propose Sparse Representation Matching (SRM) to transfer intermediate knowledge from one Convolutional Network (CNN) to another by utilizing sparse representation.
We formulate as a neural processing block, which can be efficiently optimized using gradient descent and integrated into any CNN in a plug-and-play manner.
Our experiments demonstrate that is robust to architectural differences between the teacher and student networks, and outperforms other KD techniques across several datasets.
arXiv Detail & Related papers (2021-03-31T11:47:47Z) - Network-Agnostic Knowledge Transfer for Medical Image Segmentation [2.25146058725705]
We propose a knowledge transfer approach from a teacher to a student network wherein we train the student on an independent transferal dataset.
We studied knowledge transfer from a single teacher, combination of knowledge transfer and fine-tuning, and knowledge transfer from multiple teachers.
The proposed algorithm is effective for knowledge transfer and easily tunable.
arXiv Detail & Related papers (2021-01-23T19:06:14Z) - Wasserstein Contrastive Representation Distillation [114.24609306495456]
We propose Wasserstein Contrastive Representation Distillation (WCoRD), which leverages both primal and dual forms of Wasserstein distance for knowledge distillation.
The dual form is used for global knowledge transfer, yielding a contrastive learning objective that maximizes the lower bound of mutual information between the teacher and the student networks.
Experiments demonstrate that the proposed WCoRD method outperforms state-of-the-art approaches on privileged information distillation, model compression and cross-modal transfer.
arXiv Detail & Related papers (2020-12-15T23:43:28Z) - CrossTransformers: spatially-aware few-shot transfer [92.33252608837947]
Given new tasks with very little data, modern vision systems degrade remarkably quickly.
We show how the neural network representations which underpin modern vision systems are subject to supervision collapse.
We propose self-supervised learning to encourage general-purpose features that transfer better.
arXiv Detail & Related papers (2020-07-22T15:37:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.