Related papers: CORSD: Class-Oriented Relational Self Distillation

CORSD: Class-Oriented Relational Self Distillation

URL: http://arxiv.org/abs/2305.00918v1
Date: Fri, 28 Apr 2023 16:00:31 GMT
Title: CORSD: Class-Oriented Relational Self Distillation
Authors: Muzhou Yu, Sia Huat Tan, Kailu Wu, Runpei Dong, Linfeng Zhang, Kaisheng Ma
Abstract summary: Knowledge distillation conducts an effective model compression method while holding some limitations. We propose a novel training framework named Class-Oriented Self Distillation (CORSD) to address the limitations.
Score: 16.11986532440837
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Knowledge distillation conducts an effective model compression method while holding some limitations:(1) the feature based distillation methods only focus on distilling the feature map but are lack of transferring the relation of data examples; (2) the relational distillation methods are either limited to the handcrafted functions for relation extraction, such as L2 norm, or weak in inter- and intra- class relation modeling. Besides, the feature divergence of heterogeneous teacher-student architectures may lead to inaccurate relational knowledge transferring. In this work, we propose a novel training framework named Class-Oriented Relational Self Distillation (CORSD) to address the limitations. The trainable relation networks are designed to extract relation of structured data input, and they enable the whole model to better classify samples by transferring the relational knowledge from the deepest layer of the model to shallow layers. Besides, auxiliary classifiers are proposed to make relation networks capture class-oriented relation that benefits classification task. Experiments demonstrate that CORSD achieves remarkable improvements. Compared to baseline, 3.8%, 1.5% and 4.5% averaged accuracy boost can be observed on CIFAR100, ImageNet and CUB-200-2011, respectively.

Related papers

VRM: Knowledge Distillation via Virtual Relation Matching [14.272319084378488]
We tackle several key issues in relation-based methods, including their susceptibility to overfitting and spurious responses. We transfer novelly constructed affinity graphs that compactly encapsulate a wealth of beneficial inter-sample, inter-class, and inter-view correlations. Experiments on CIFAR-100 and ImageNet datasets demonstrate the superior performance of the proposed virtual relation matching (VRM) method.
arXiv Detail & Related papers (2025-02-28T06:29:39Z)
Feature Interaction Fusion Self-Distillation Network For CTR Prediction [14.12775753361368]
Click-Through Rate (CTR) prediction plays a vital role in recommender systems, online advertising, and search engines. We propose FSDNet, a CTR prediction framework incorporating a plug-and-play fusion self-distillation module.
arXiv Detail & Related papers (2024-11-12T03:05:03Z)
TAS: Distilling Arbitrary Teacher and Student via a Hybrid Assistant [52.0297393822012]
We introduce an assistant model as a bridge to facilitate smooth feature knowledge transfer between heterogeneous teachers and students. Within our proposed design principle, the assistant model combines the advantages of cross-architecture inductive biases and module functions. Our proposed method is evaluated across some homogeneous model pairs and arbitrary heterogeneous combinations of CNNs, ViTs, spatial KDs.
arXiv Detail & Related papers (2024-10-16T08:02:49Z)
One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation [69.65734716679925]
Knowledge distillation has proven to be a highly effective approach for enhancing model performance through a teacher-student training scheme. Most existing distillation methods are designed under the assumption that the teacher and student models belong to the same model family. We propose a simple yet effective one-for-all KD framework called OFA-KD, which significantly improves the distillation performance between heterogeneous architectures.
arXiv Detail & Related papers (2023-10-30T11:13:02Z)
EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR) We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model. We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z)
Continual Contrastive Finetuning Improves Low-Resource Relation Extraction [34.76128090845668]
Relation extraction has been particularly challenging in low-resource scenarios and domains. Recent literature has tackled low-resource RE by self-supervised learning. We propose to pretrain and finetune the RE model using consistent objectives of contrastive learning.
arXiv Detail & Related papers (2022-12-21T07:30:22Z)
Directed Acyclic Graph Factorization Machines for CTR Prediction via Knowledge Distillation [65.62538699160085]
We propose a Directed Acyclic Graph Factorization Machine (KD-DAGFM) to learn the high-order feature interactions from existing complex interaction models for CTR prediction via Knowledge Distillation. KD-DAGFM achieves the best performance with less than 21.5% FLOPs of the state-of-the-art method on both online and offline experiments.
arXiv Detail & Related papers (2022-11-21T03:09:42Z)
Weakly Supervised Semantic Segmentation via Alternative Self-Dual Teaching [82.71578668091914]
This paper establishes a compact learning framework that embeds the classification and mask-refinement components into a unified deep model. We propose a novel alternative self-dual teaching (ASDT) mechanism to encourage high-quality knowledge interaction.
arXiv Detail & Related papers (2021-12-17T11:56:56Z)
Complementary Relation Contrastive Distillation [13.944372633594085]
We propose a novel knowledge distillation method, namely Complementary Relation Contrastive Distillation (CRCD) We estimate the mutual relation in an anchor-based way and distill the anchor-student relation under the supervision of its corresponding anchor-teacher relation. Experiments on different benchmarks demonstrate the effectiveness of our proposed CRCD.
arXiv Detail & Related papers (2021-03-29T02:43:03Z)
Similarity Transfer for Knowledge Distillation [25.042405967561212]
Knowledge distillation is a popular paradigm for learning portable neural networks by transferring the knowledge from a large model into a smaller one. We propose a novel method called similarity transfer for knowledge distillation (STKD), which aims to fully utilize the similarities between categories of multiple samples. It shows that STKD substantially has outperformed the vanilla knowledge distillation and has achieved superior accuracy over the state-of-the-art knowledge distillation methods.
arXiv Detail & Related papers (2021-03-18T06:54:59Z)
Wasserstein Contrastive Representation Distillation [114.24609306495456]
We propose Wasserstein Contrastive Representation Distillation (WCoRD), which leverages both primal and dual forms of Wasserstein distance for knowledge distillation. The dual form is used for global knowledge transfer, yielding a contrastive learning objective that maximizes the lower bound of mutual information between the teacher and the student networks. Experiments demonstrate that the proposed WCoRD method outperforms state-of-the-art approaches on privileged information distillation, model compression and cross-modal transfer.
arXiv Detail & Related papers (2020-12-15T23:43:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.