Natural Statistics of Network Activations and Implications for Knowledge
Distillation
- URL: http://arxiv.org/abs/2106.00368v1
- Date: Tue, 1 Jun 2021 10:18:30 GMT
- Title: Natural Statistics of Network Activations and Implications for Knowledge
Distillation
- Authors: Michael Rotman and Lior Wolf
- Abstract summary: We study the natural statistics of the deep neural network activations at various layers.
We show, both analytically and empirically, that with depth the exponent of this power law increases at a linear rate.
We present a method for performing Knowledge Distillation (KD)
- Score: 95.15239893744791
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In a matter that is analog to the study of natural image statistics, we study
the natural statistics of the deep neural network activations at various
layers. As we show, these statistics, similar to image statistics, follow a
power law. We also show, both analytically and empirically, that with depth the
exponent of this power law increases at a linear rate.
As a direct implication of our discoveries, we present a method for
performing Knowledge Distillation (KD). While classical KD methods consider the
logits of the teacher network, more recent methods obtain a leap in performance
by considering the activation maps. This, however, uses metrics that are
suitable for comparing images. We propose to employ two additional loss terms
that are based on the spectral properties of the intermediate activation maps.
The proposed method obtains state of the art results on multiple image
recognition KD benchmarks.
Related papers
- Multi Teacher Privileged Knowledge Distillation for Multimodal Expression Recognition [58.41784639847413]
Human emotion is a complex phenomenon conveyed and perceived through facial expressions, vocal tones, body language, and physiological signals.
In this paper, a multi-teacher PKD (MT-PKDOT) method with self-distillation is introduced to align diverse teacher representations before distilling them to the student.
Results indicate that our proposed method can outperform SOTA PKD methods.
arXiv Detail & Related papers (2024-08-16T22:11:01Z) - Impact of a DCT-driven Loss in Attention-based Knowledge-Distillation
for Scene Recognition [64.29650787243443]
We propose and analyse the use of a 2D frequency transform of the activation maps before transferring them.
This strategy enhances knowledge transferability in tasks such as scene recognition.
We publicly release the training and evaluation framework used along this paper at http://www.vpu.eps.uam.es/publications/DCTBasedKDForSceneRecognition.
arXiv Detail & Related papers (2022-05-04T11:05:18Z) - Efficient training of lightweight neural networks using Online
Self-Acquired Knowledge Distillation [51.66271681532262]
Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner.
We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
arXiv Detail & Related papers (2021-08-26T14:01:04Z) - Evaluation of Saliency-based Explainability Method [2.733700237741334]
A class of Explainable AI (XAI) methods provide saliency maps to highlight part of the image a CNN model looks at to classify the image as a way to explain its working.
These methods provide an intuitive way for users to understand predictions made by CNNs.
Other than quantitative computational tests, the vast majority of evidence to highlight that the methods are valuable is anecdotal.
arXiv Detail & Related papers (2021-06-24T05:40:50Z) - Knowledge Distillation By Sparse Representation Matching [107.87219371697063]
We propose Sparse Representation Matching (SRM) to transfer intermediate knowledge from one Convolutional Network (CNN) to another by utilizing sparse representation.
We formulate as a neural processing block, which can be efficiently optimized using gradient descent and integrated into any CNN in a plug-and-play manner.
Our experiments demonstrate that is robust to architectural differences between the teacher and student networks, and outperforms other KD techniques across several datasets.
arXiv Detail & Related papers (2021-03-31T11:47:47Z) - Similarity Transfer for Knowledge Distillation [25.042405967561212]
Knowledge distillation is a popular paradigm for learning portable neural networks by transferring the knowledge from a large model into a smaller one.
We propose a novel method called similarity transfer for knowledge distillation (STKD), which aims to fully utilize the similarities between categories of multiple samples.
It shows that STKD substantially has outperformed the vanilla knowledge distillation and has achieved superior accuracy over the state-of-the-art knowledge distillation methods.
arXiv Detail & Related papers (2021-03-18T06:54:59Z) - Visualization of Supervised and Self-Supervised Neural Networks via
Attribution Guided Factorization [87.96102461221415]
We develop an algorithm that provides per-class explainability.
In an extensive battery of experiments, we demonstrate the ability of our methods to class-specific visualization.
arXiv Detail & Related papers (2020-12-03T18:48:39Z) - A combined full-reference image quality assessment approach based on
convolutional activation maps [0.0]
The goal of full-reference image quality assessment (FR-IQA) is to predict the quality of an image as perceived by human observers with using its pristine, reference counterpart.
In this study, we explore a novel, combined approach which predicts the perceptual quality of a distorted image by compiling a feature vector from convolutional activation maps.
arXiv Detail & Related papers (2020-10-19T10:00:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.