Related papers: Quantifying the Knowledge in a DNN to Explain Knowledge Distillation for Classification

Quantifying the Knowledge in a DNN to Explain Knowledge Distillation for Classification

URL: http://arxiv.org/abs/2208.08741v1
Date: Thu, 18 Aug 2022 09:47:31 GMT
Title: Quantifying the Knowledge in a DNN to Explain Knowledge Distillation for Classification
Authors: Quanshi Zhang, Xu Cheng, Yilan Chen, Zhefan Rao
Abstract summary: This paper provides a new perspective to explain the success of knowledge distillation, based on the information theory. A knowledge point is referred to as an input unit, whose information is much less discarded than other input units. We propose three hypotheses for knowledge distillation based on the quantification of knowledge points.
Score: 27.98287660940717
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Compared to traditional learning from scratch, knowledge distillation sometimes makes the DNN achieve superior performance. This paper provides a new perspective to explain the success of knowledge distillation, i.e., quantifying knowledge points encoded in intermediate layers of a DNN for classification, based on the information theory. To this end, we consider the signal processing in a DNN as the layer-wise information discarding. A knowledge point is referred to as an input unit, whose information is much less discarded than other input units. Thus, we propose three hypotheses for knowledge distillation based on the quantification of knowledge points. 1. The DNN learning from knowledge distillation encodes more knowledge points than the DNN learning from scratch. 2. Knowledge distillation makes the DNN more likely to learn different knowledge points simultaneously. In comparison, the DNN learning from scratch tends to encode various knowledge points sequentially. 3. The DNN learning from knowledge distillation is often optimized more stably than the DNN learning from scratch. In order to verify the above hypotheses, we design three types of metrics with annotations of foreground objects to analyze feature representations of the DNN, \textit{i.e.} the quantity and the quality of knowledge points, the learning speed of different knowledge points, and the stability of optimization directions. In experiments, we diagnosed various DNNs for different classification tasks, i.e., image classification, 3D point cloud classification, binary sentiment classification, and question answering, which verified above hypotheses.

Related papers

BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation [20.34272550256856]
Spiking neural networks (SNNs) mimic biological neural system to convey information via discrete spikes. Our work achieves state-of-the-art performance for training SNNs on both static and neuromorphic datasets.
arXiv Detail & Related papers (2024-07-12T08:17:24Z)
Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators [78.63553017938911]
Large language models (LLMs) outperform information retrieval techniques for downstream knowledge-intensive tasks. However, community concerns abound regarding the factuality and potential implications of using this uncensored knowledge. We introduce CONNER, designed to evaluate generated knowledge from six important perspectives.
arXiv Detail & Related papers (2023-10-11T08:22:37Z)
Shared Growth of Graph Neural Networks via Prompted Free-direction Knowledge Distillation [39.35619721100205]
We propose the first Free-direction Knowledge Distillation framework via reinforcement learning for graph neural networks (GNNs) Our core idea is to collaboratively learn two shallower GNNs to exchange knowledge between them. Experiments on five benchmark datasets demonstrate our approaches outperform the base GNNs in a large margin.
arXiv Detail & Related papers (2023-07-02T10:03:01Z)
Boosting Graph Neural Networks via Adaptive Knowledge Distillation [18.651451228086643]
Graph neural networks (GNNs) have shown remarkable performance on diverse graph mining tasks. Knowledge distillation (KD) is developed to combine the diverse knowledge from multiple models. We propose a novel adaptive KD framework, called BGNN, which sequentially transfers knowledge from multiple GNNs into a student GNN.
arXiv Detail & Related papers (2022-10-12T04:48:50Z)
Explainability Tools Enabling Deep Learning in Future In-Situ Real-Time Planetary Explorations [58.720142291102135]
Deep learning (DL) has proven to be an effective machine learning and computer vision technique. Most of the Deep Neural Network (DNN) architectures are so complex that they are considered a 'black box' In this paper, we used integrated gradients to describe the attributions of each neuron to the output classes. It provides a set of explainability tools (ET) that opens the black box of a DNN so that the individual contribution of neurons to category classification can be ranked and visualized.
arXiv Detail & Related papers (2022-01-15T07:10:00Z)
What Do Deep Nets Learn? Class-wise Patterns Revealed in the Input Space [88.37185513453758]
We propose a method to visualize and understand the class-wise knowledge learned by deep neural networks (DNNs) under different settings. Our method searches for a single predictive pattern in the pixel space to represent the knowledge learned by the model for each class. In the adversarial setting, we show that adversarially trained models tend to learn more simplified shape patterns.
arXiv Detail & Related papers (2021-01-18T06:38:41Z)
Towards a Universal Continuous Knowledge Base [49.95342223987143]
We propose a method for building a continuous knowledge base that can store knowledge imported from multiple neural networks. Experiments on text classification show promising results. We import the knowledge from multiple models to the knowledge base, from which the fused knowledge is exported back to a single model.
arXiv Detail & Related papers (2020-12-25T12:27:44Z)
Boosting Deep Neural Networks with Geometrical Prior Knowledge: A Survey [77.99182201815763]
Deep Neural Networks (DNNs) achieve state-of-the-art results in many different problem settings. DNNs are often treated as black box systems, which complicates their evaluation and validation. One promising field, inspired by the success of convolutional neural networks (CNNs) in computer vision tasks, is to incorporate knowledge about symmetric geometrical transformations.
arXiv Detail & Related papers (2020-06-30T14:56:05Z)
Architecture Disentanglement for Deep Neural Networks [174.16176919145377]
We introduce neural architecture disentanglement (NAD) to explain the inner workings of deep neural networks (DNNs) NAD learns to disentangle a pre-trained DNN into sub-architectures according to independent tasks, forming information flows that describe the inference processes. Results show that misclassified images have a high probability of being assigned to task sub-architectures similar to the correct ones.
arXiv Detail & Related papers (2020-03-30T08:34:33Z)
Explaining Knowledge Distillation by Quantifying the Knowledge [27.98287660940717]
This paper presents a method to interpret the success of knowledge distillation by quantifying and analyzing task-relevant and task-irrelevant visual concepts. Knowledge distillation makes the DNN learn more visual concepts than learning from raw data.
arXiv Detail & Related papers (2020-03-07T18:09:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.