Complementary Relation Contrastive Distillation
- URL: http://arxiv.org/abs/2103.16367v1
- Date: Mon, 29 Mar 2021 02:43:03 GMT
- Title: Complementary Relation Contrastive Distillation
- Authors: Jinguo Zhu and Shixiang Tang and Dapeng Chen and Shijie Yu and Yakun
Liu and Aijun Yang and Mingzhe Rong and Xiaohua Wang
- Abstract summary: We propose a novel knowledge distillation method, namely Complementary Relation Contrastive Distillation (CRCD)
We estimate the mutual relation in an anchor-based way and distill the anchor-student relation under the supervision of its corresponding anchor-teacher relation.
Experiments on different benchmarks demonstrate the effectiveness of our proposed CRCD.
- Score: 13.944372633594085
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge distillation aims to transfer representation ability from a teacher
model to a student model. Previous approaches focus on either individual
representation distillation or inter-sample similarity preservation. While we
argue that the inter-sample relation conveys abundant information and needs to
be distilled in a more effective way. In this paper, we propose a novel
knowledge distillation method, namely Complementary Relation Contrastive
Distillation (CRCD), to transfer the structural knowledge from the teacher to
the student. Specifically, we estimate the mutual relation in an anchor-based
way and distill the anchor-student relation under the supervision of its
corresponding anchor-teacher relation. To make it more robust, mutual relations
are modeled by two complementary elements: the feature and its gradient.
Furthermore, the low bound of mutual information between the anchor-teacher
relation distribution and the anchor-student relation distribution is maximized
via relation contrastive loss, which can distill both the sample representation
and the inter-sample relations. Experiments on different benchmarks demonstrate
the effectiveness of our proposed CRCD.
Related papers
- Learning Complete Topology-Aware Correlations Between Relations for Inductive Link Prediction [121.65152276851619]
We show that semantic correlations between relations are inherently edge-level and entity-independent.
We propose a novel subgraph-based method, namely TACO, to model Topology-Aware COrrelations between relations.
To further exploit the potential of RCN, we propose Complete Common Neighbor induced subgraph.
arXiv Detail & Related papers (2023-09-20T08:11:58Z) - Faithful Knowledge Distillation [75.59907631395849]
We focus on two crucial questions with regard to a teacher-student pair: (i) do the teacher and student disagree at points close to correctly classified dataset examples, and (ii) is the distilled student as confident as the teacher around dataset examples?
These are critical questions when considering the deployment of a smaller student network trained from a robust teacher within a safety-critical setting.
arXiv Detail & Related papers (2023-06-07T13:41:55Z) - Improving Continual Relation Extraction by Distinguishing Analogous
Semantics [11.420578494453343]
Continual relation extraction aims to learn constantly emerging relations while avoiding forgetting the learned relations.
Existing works store a small number of typical samples to re-train the model for alleviating forgetting.
We conduct an empirical study on existing works and observe that their performance is severely affected by analogous relations.
arXiv Detail & Related papers (2023-05-11T07:32:20Z) - CORSD: Class-Oriented Relational Self Distillation [16.11986532440837]
Knowledge distillation conducts an effective model compression method while holding some limitations.
We propose a novel training framework named Class-Oriented Self Distillation (CORSD) to address the limitations.
arXiv Detail & Related papers (2023-04-28T16:00:31Z) - CMD: Self-supervised 3D Action Representation Learning with Cross-modal
Mutual Distillation [130.08432609780374]
In 3D action recognition, there exists rich complementary information between skeleton modalities.
We propose a new Cross-modal Mutual Distillation (CMD) framework with the following designs.
Our approach outperforms existing self-supervised methods and sets a series of new records.
arXiv Detail & Related papers (2022-08-26T06:06:09Z) - Information Theoretic Representation Distillation [20.802135299032308]
We forge an alternative connection between information theory and knowledge distillation using a recently proposed entropy-like functional.
Our method achieves competitive performance to state-of-the-art on the knowledge distillation and cross-model transfer tasks.
We shed light to a new state-of-the-art for binary quantisation.
arXiv Detail & Related papers (2021-12-01T12:39:50Z) - PAIR: Leveraging Passage-Centric Similarity Relation for Improving Dense
Passage Retrieval [87.68667887072324]
We propose a novel approach that leverages query-centric and PAssage-centric sImilarity Relations (called PAIR) for dense passage retrieval.
To implement our approach, we make three major technical contributions by introducing formal formulations of the two kinds of similarity relations.
Our approach significantly outperforms previous state-of-the-art models on both MSMARCO and Natural Questions datasets.
arXiv Detail & Related papers (2021-08-13T02:07:43Z) - Wasserstein Contrastive Representation Distillation [114.24609306495456]
We propose Wasserstein Contrastive Representation Distillation (WCoRD), which leverages both primal and dual forms of Wasserstein distance for knowledge distillation.
The dual form is used for global knowledge transfer, yielding a contrastive learning objective that maximizes the lower bound of mutual information between the teacher and the student networks.
Experiments demonstrate that the proposed WCoRD method outperforms state-of-the-art approaches on privileged information distillation, model compression and cross-modal transfer.
arXiv Detail & Related papers (2020-12-15T23:43:28Z) - Learning to Decouple Relations: Few-Shot Relation Classification with
Entity-Guided Attention and Confusion-Aware Training [49.9995628166064]
We propose CTEG, a model equipped with two mechanisms to learn to decouple easily-confused relations.
On the one hand, an EGA mechanism is introduced to guide the attention to filter out information causing confusion.
On the other hand, a Confusion-Aware Training (CAT) method is proposed to explicitly learn to distinguish relations.
arXiv Detail & Related papers (2020-10-21T11:07:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.