Quantifying the Knowledge in a DNN to Explain Knowledge Distillation for
Classification
- URL: http://arxiv.org/abs/2208.08741v1
- Date: Thu, 18 Aug 2022 09:47:31 GMT
- Title: Quantifying the Knowledge in a DNN to Explain Knowledge Distillation for
Classification
- Authors: Quanshi Zhang, Xu Cheng, Yilan Chen, Zhefan Rao
- Abstract summary: This paper provides a new perspective to explain the success of knowledge distillation, based on the information theory.
A knowledge point is referred to as an input unit, whose information is much less discarded than other input units.
We propose three hypotheses for knowledge distillation based on the quantification of knowledge points.
- Score: 27.98287660940717
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compared to traditional learning from scratch, knowledge distillation
sometimes makes the DNN achieve superior performance. This paper provides a new
perspective to explain the success of knowledge distillation, i.e., quantifying
knowledge points encoded in intermediate layers of a DNN for classification,
based on the information theory. To this end, we consider the signal processing
in a DNN as the layer-wise information discarding. A knowledge point is
referred to as an input unit, whose information is much less discarded than
other input units. Thus, we propose three hypotheses for knowledge distillation
based on the quantification of knowledge points. 1. The DNN learning from
knowledge distillation encodes more knowledge points than the DNN learning from
scratch. 2. Knowledge distillation makes the DNN more likely to learn different
knowledge points simultaneously. In comparison, the DNN learning from scratch
tends to encode various knowledge points sequentially. 3. The DNN learning from
knowledge distillation is often optimized more stably than the DNN learning
from scratch. In order to verify the above hypotheses, we design three types of
metrics with annotations of foreground objects to analyze feature
representations of the DNN, \textit{i.e.} the quantity and the quality of
knowledge points, the learning speed of different knowledge points, and the
stability of optimization directions. In experiments, we diagnosed various DNNs
for different classification tasks, i.e., image classification, 3D point cloud
classification, binary sentiment classification, and question answering, which
verified above hypotheses.
Related papers
- BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation [20.34272550256856]
Spiking neural networks (SNNs) mimic biological neural system to convey information via discrete spikes.
Our work achieves state-of-the-art performance for training SNNs on both static and neuromorphic datasets.
arXiv Detail & Related papers (2024-07-12T08:17:24Z) - Beyond Factuality: A Comprehensive Evaluation of Large Language Models
as Knowledge Generators [78.63553017938911]
Large language models (LLMs) outperform information retrieval techniques for downstream knowledge-intensive tasks.
However, community concerns abound regarding the factuality and potential implications of using this uncensored knowledge.
We introduce CONNER, designed to evaluate generated knowledge from six important perspectives.
arXiv Detail & Related papers (2023-10-11T08:22:37Z) - Shared Growth of Graph Neural Networks via Prompted Free-direction
Knowledge Distillation [39.35619721100205]
We propose the first Free-direction Knowledge Distillation framework via reinforcement learning for graph neural networks (GNNs)
Our core idea is to collaboratively learn two shallower GNNs to exchange knowledge between them.
Experiments on five benchmark datasets demonstrate our approaches outperform the base GNNs in a large margin.
arXiv Detail & Related papers (2023-07-02T10:03:01Z) - Boosting Graph Neural Networks via Adaptive Knowledge Distillation [18.651451228086643]
Graph neural networks (GNNs) have shown remarkable performance on diverse graph mining tasks.
Knowledge distillation (KD) is developed to combine the diverse knowledge from multiple models.
We propose a novel adaptive KD framework, called BGNN, which sequentially transfers knowledge from multiple GNNs into a student GNN.
arXiv Detail & Related papers (2022-10-12T04:48:50Z) - Explainability Tools Enabling Deep Learning in Future In-Situ Real-Time
Planetary Explorations [58.720142291102135]
Deep learning (DL) has proven to be an effective machine learning and computer vision technique.
Most of the Deep Neural Network (DNN) architectures are so complex that they are considered a 'black box'
In this paper, we used integrated gradients to describe the attributions of each neuron to the output classes.
It provides a set of explainability tools (ET) that opens the black box of a DNN so that the individual contribution of neurons to category classification can be ranked and visualized.
arXiv Detail & Related papers (2022-01-15T07:10:00Z) - What Do Deep Nets Learn? Class-wise Patterns Revealed in the Input Space [88.37185513453758]
We propose a method to visualize and understand the class-wise knowledge learned by deep neural networks (DNNs) under different settings.
Our method searches for a single predictive pattern in the pixel space to represent the knowledge learned by the model for each class.
In the adversarial setting, we show that adversarially trained models tend to learn more simplified shape patterns.
arXiv Detail & Related papers (2021-01-18T06:38:41Z) - Towards a Universal Continuous Knowledge Base [49.95342223987143]
We propose a method for building a continuous knowledge base that can store knowledge imported from multiple neural networks.
Experiments on text classification show promising results.
We import the knowledge from multiple models to the knowledge base, from which the fused knowledge is exported back to a single model.
arXiv Detail & Related papers (2020-12-25T12:27:44Z) - Boosting Deep Neural Networks with Geometrical Prior Knowledge: A Survey [77.99182201815763]
Deep Neural Networks (DNNs) achieve state-of-the-art results in many different problem settings.
DNNs are often treated as black box systems, which complicates their evaluation and validation.
One promising field, inspired by the success of convolutional neural networks (CNNs) in computer vision tasks, is to incorporate knowledge about symmetric geometrical transformations.
arXiv Detail & Related papers (2020-06-30T14:56:05Z) - Architecture Disentanglement for Deep Neural Networks [174.16176919145377]
We introduce neural architecture disentanglement (NAD) to explain the inner workings of deep neural networks (DNNs)
NAD learns to disentangle a pre-trained DNN into sub-architectures according to independent tasks, forming information flows that describe the inference processes.
Results show that misclassified images have a high probability of being assigned to task sub-architectures similar to the correct ones.
arXiv Detail & Related papers (2020-03-30T08:34:33Z) - Explaining Knowledge Distillation by Quantifying the Knowledge [27.98287660940717]
This paper presents a method to interpret the success of knowledge distillation by quantifying and analyzing task-relevant and task-irrelevant visual concepts.
Knowledge distillation makes the DNN learn more visual concepts than learning from raw data.
arXiv Detail & Related papers (2020-03-07T18:09:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.