Proto2Proto: Can you recognize the car, the way I do?
- URL: http://arxiv.org/abs/2204.11830v1
- Date: Mon, 25 Apr 2022 17:59:30 GMT
- Title: Proto2Proto: Can you recognize the car, the way I do?
- Authors: Monish Keswani, Sriranjani Ramakrishnan, Nishant Reddy, Vineeth N
Balasubramanian
- Abstract summary: We present Proto2Proto, a novel method to transfer interpretability of one part network to another via knowledge distillation.
Our approach aims to add interpretability to the "dark" knowledge transferred from the teacher to the shallower student model.
Our experiments show that the proposed method indeed achieves interpretability transfer from teacher to student while simultaneously exhibiting competitive performance.
- Score: 23.09799187888976
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prototypical methods have recently gained a lot of attention due to their
intrinsic interpretable nature, which is obtained through the prototypes. With
growing use cases of model reuse and distillation, there is a need to also
study transfer of interpretability from one model to another. We present
Proto2Proto, a novel method to transfer interpretability of one prototypical
part network to another via knowledge distillation. Our approach aims to add
interpretability to the "dark" knowledge transferred from the teacher to the
shallower student model. We propose two novel losses: "Global Explanation" loss
and "Patch-Prototype Correspondence" loss to facilitate such a transfer. Global
Explanation loss forces the student prototypes to be close to teacher
prototypes, and Patch-Prototype Correspondence loss enforces the local
representations of the student to be similar to that of the teacher. Further,
we propose three novel metrics to evaluate the student's proximity to the
teacher as measures of interpretability transfer in our settings. We
qualitatively and quantitatively demonstrate the effectiveness of our method on
CUB-200-2011 and Stanford Cars datasets. Our experiments show that the proposed
method indeed achieves interpretability transfer from teacher to student while
simultaneously exhibiting competitive performance.
Related papers
- Feature Representation Transferring to Lightweight Models via Perception Coherence [3.3975558777609915]
We propose a method for transferring feature representation to lightweight student models from larger teacher models.<n>Our method outperforms or achieves on-par performance compared to strong baseline methods for representation transferring.
arXiv Detail & Related papers (2025-05-10T10:55:06Z) - Predefined Prototypes for Intra-Class Separation and Disentanglement [10.005120138175206]
Prototypical Learning is based on the idea that there is a point (which we call prototype) around which the embeddings of a class are clustered.
We propose to predefine prototypes following human-specified criteria, which simplify the training pipeline and brings different advantages.
arXiv Detail & Related papers (2024-06-23T15:52:23Z) - I2CKD : Intra- and Inter-Class Knowledge Distillation for Semantic Segmentation [1.433758865948252]
This paper proposes a new knowledge distillation method tailored for image semantic segmentation, termed Intra- and Inter-Class Knowledge Distillation (I2CKD)
The focus of this method is on capturing and transferring knowledge between the intermediate layers of teacher (cumbersome model) and student (compact model)
arXiv Detail & Related papers (2024-03-27T12:05:22Z) - Distilling Efficient Vision Transformers from CNNs for Semantic
Segmentation [12.177329445930276]
We propose a novel CNN-to-ViT KD framework, dubbed C2VKD.
We first propose a novel visual-linguistic feature distillation (VLFD) module that explores efficient KD among the aligned visual and linguistic-compatible representations.
We then propose a pixel-wise decoupled distillation (PDD) module to supervise the student under the combination of labels and teacher's predictions from the decoupled target and non-target classes.
arXiv Detail & Related papers (2023-10-11T07:45:37Z) - It's All in the Head: Representation Knowledge Distillation through
Classifier Sharing [0.29360071145551075]
We introduce two approaches for enhancing representation distillation using classifier sharing between the teacher and student.
We show the effectiveness of the proposed methods on various datasets and tasks, including image classification, fine-grained classification, and face verification.
arXiv Detail & Related papers (2022-01-18T13:10:36Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Few-shot Action Recognition with Prototype-centered Attentive Learning [88.10852114988829]
Prototype-centered Attentive Learning (PAL) model composed of two novel components.
First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective.
Second, PAL integrates a attentive hybrid learning mechanism that can minimize the negative impacts of outliers.
arXiv Detail & Related papers (2021-01-20T11:48:12Z) - Wasserstein Contrastive Representation Distillation [114.24609306495456]
We propose Wasserstein Contrastive Representation Distillation (WCoRD), which leverages both primal and dual forms of Wasserstein distance for knowledge distillation.
The dual form is used for global knowledge transfer, yielding a contrastive learning objective that maximizes the lower bound of mutual information between the teacher and the student networks.
Experiments demonstrate that the proposed WCoRD method outperforms state-of-the-art approaches on privileged information distillation, model compression and cross-modal transfer.
arXiv Detail & Related papers (2020-12-15T23:43:28Z) - Deep Semi-supervised Knowledge Distillation for Overlapping Cervical
Cell Instance Segmentation [54.49894381464853]
We propose to leverage both labeled and unlabeled data for instance segmentation with improved accuracy by knowledge distillation.
We propose a novel Mask-guided Mean Teacher framework with Perturbation-sensitive Sample Mining.
Experiments show that the proposed method improves the performance significantly compared with the supervised method learned from labeled data only.
arXiv Detail & Related papers (2020-07-21T13:27:09Z) - Distilling Object Detectors with Task Adaptive Regularization [97.52935611385179]
Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices.
Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization.
arXiv Detail & Related papers (2020-06-23T15:58:22Z) - Knowledge distillation via adaptive instance normalization [52.91164959767517]
We propose a new knowledge distillation method based on transferring feature statistics from the teacher to the student.
Our method goes beyond the standard way of enforcing the mean and variance of the student to be similar to those of the teacher.
We show that our distillation method outperforms other state-of-the-art distillation methods over a large set of experimental settings.
arXiv Detail & Related papers (2020-03-09T17:50:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.