Related papers: Exploring Graph-based Knowledge: Multi-Level Feature Distillation via Channels Relational Graph

Exploring Graph-based Knowledge: Multi-Level Feature Distillation via Channels Relational Graph

URL: http://arxiv.org/abs/2405.08547v2
Date: Thu, 16 May 2024 05:25:01 GMT
Title: Exploring Graph-based Knowledge: Multi-Level Feature Distillation via Channels Relational Graph
Authors: Zhiwei Wang, Jun Huang, Longhua Ma, Chengyu Wu, Hongyu Ma,
Abstract summary: In visual tasks, large teacher models capture essential features and deep information, enhancing performance. We propose a distillation framework based on graph knowledge, including a multi-level feature alignment strategy. We emphasize spectral embedding (SE) as a key technique in our distillation process, which merges the student's feature space with the relational knowledge and structural complexities similar to the teacher network.
Score: 8.646512035461994
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: In visual tasks, large teacher models capture essential features and deep information, enhancing performance. However, distilling this information into smaller student models often leads to performance loss due to structural differences and capacity limitations. To tackle this, we propose a distillation framework based on graph knowledge, including a multi-level feature alignment strategy and an attention-guided mechanism to provide a targeted learning trajectory for the student model. We emphasize spectral embedding (SE) as a key technique in our distillation process, which merges the student's feature space with the relational knowledge and structural complexities similar to the teacher network. This method captures the teacher's understanding in a graph-based representation, enabling the student model to more accurately mimic the complex structural dependencies present in the teacher model. Compared to methods that focus only on specific distillation areas, our strategy not only considers key features within the teacher model but also endeavors to capture the relationships and interactions among feature sets, encoding these complex pieces of information into a graph structure to understand and utilize the dynamic relationships among these pieces of information from a global perspective. Experiments show that our method outperforms previous feature distillation methods on the CIFAR-100, MS-COCO, and Pascal VOC datasets, proving its efficiency and applicability.

Related papers

Learning Task-Agnostic Representations through Multi-Teacher Distillation [59.488314181423284]
We introduce a task-agnostic framework based on a majority vote" objective function.<n>We demonstrate that this function is bounded by the mutual information between student and teachers' embeddings.<n>Our method effectively leverages teacher diversity, resulting in representations enabling better performance for a wide range of downstream tasks.
arXiv Detail & Related papers (2025-10-21T14:36:33Z)
Adversarial Curriculum Graph-Free Knowledge Distillation for Graph Neural Networks [61.608453110751206]
We propose a fast and high-quality data-free knowledge distillation approach for graph neural networks. The proposed graph-free KD method (ACGKD) significantly reduces the spatial complexity of pseudo-graphs. ACGKD eliminates the dimensional ambiguity between the student and teacher models by increasing the student's dimensions.
arXiv Detail & Related papers (2025-04-01T08:44:27Z)
LAKD-Activation Mapping Distillation Based on Local Learning [12.230042188890838]
This paper proposes a novel knowledge distillation framework, Local Attention Knowledge Distillation (LAKD) LAKD more efficiently utilizes the distilled information from teacher networks, achieving higher interpretability and competitive performance. We conducted experiments on the CIFAR-10, CIFAR-100, and ImageNet datasets, and the results show that our LAKD method significantly outperforms existing methods.
arXiv Detail & Related papers (2024-08-21T09:43:27Z)
Attention-guided Feature Distillation for Semantic Segmentation [8.344263189293578]
This paper showcases the efficacy of a simple yet powerful method for utilizing refined feature maps to transfer attention. The proposed Attention-guided Feature Distillation (AttnFD) method, employs the Convolutional Block Attention Module (CBAM) It achieves state-of-the-art results in terms of improving the mean Intersection over Union (mIoU) of the student network on the PascalVoc 2012, Cityscapes, COCO, and CamVid datasets.
arXiv Detail & Related papers (2024-03-08T16:57:47Z)
Graph Relation Distillation for Efficient Biomedical Instance Segmentation [80.51124447333493]
We propose a graph relation distillation approach for efficient biomedical instance segmentation. We introduce two graph distillation schemes deployed at both the intra-image level and the inter-image level. Experimental results on a number of biomedical datasets validate the effectiveness of our approach.
arXiv Detail & Related papers (2024-01-12T04:41:23Z)
Graph-level Protein Representation Learning by Structure Knowledge Refinement [50.775264276189695]
This paper focuses on learning representation on the whole graph level in an unsupervised manner. We propose a novel framework called Structure Knowledge Refinement (SKR) which uses data structure to determine the probability of whether a pair is positive or negative.
arXiv Detail & Related papers (2024-01-05T09:05:33Z)
Enhancing the Performance of Automated Grade Prediction in MOOC using Graph Representation Learning [3.4882560718166626]
Massive Open Online Courses (MOOCs) have gained significant traction as a rapidly growing phenomenon in online learning. Current automated assessment approaches overlook the structural links between different entities involved in the downstream tasks. We construct a unique knowledge graph for a large MOOC dataset, which will be publicly available to the research community.
arXiv Detail & Related papers (2023-10-18T19:27:39Z)
Knowledge Distillation via Token-level Relationship Graph [12.356770685214498]
We propose a novel method called Knowledge Distillation with Token-level Relationship Graph (TRG) By employing TRG, the student model can effectively emulate higher-level semantic information from the teacher model. We conduct experiments to evaluate the effectiveness of the proposed method against several state-of-the-art approaches.
arXiv Detail & Related papers (2023-06-20T08:16:37Z)
EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR) We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model. We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z)
Teaching What You Should Teach: A Data-Based Distillation Method [20.595460553747163]
We introduce the "Teaching what you Should Teach" strategy into a knowledge distillation framework. We propose a data-based distillation method named "TST" that searches for desirable augmented samples to assist in distilling more efficiently and rationally. To be specific, we design a neural network-based data augmentation module with priori bias, which assists in finding what meets the teacher's strengths but the student's weaknesses.
arXiv Detail & Related papers (2022-12-11T06:22:14Z)
Distilling Knowledge from Self-Supervised Teacher by Embedding Graph Alignment [52.704331909850026]
We formulate a new knowledge distillation framework to transfer the knowledge from self-supervised pre-trained models to any other student network. Inspired by the spirit of instance discrimination in self-supervised learning, we model the instance-instance relations by a graph formulation in the feature embedding space. Our distillation scheme can be flexibly applied to transfer the self-supervised knowledge to enhance representation learning on various student networks.
arXiv Detail & Related papers (2022-11-23T19:27:48Z)
Learning Knowledge Representation with Meta Knowledge Distillation for Single Image Super-Resolution [82.89021683451432]
We propose a model-agnostic meta knowledge distillation method under the teacher-student architecture for the single image super-resolution task. Experiments conducted on various single image super-resolution datasets demonstrate that our proposed method outperforms existing defined knowledge representation related distillation methods.
arXiv Detail & Related papers (2022-07-18T02:41:04Z)
Knowledge Distillation Meets Self-Supervision [109.6400639148393]
Knowledge distillation involves extracting "dark knowledge" from a teacher network to guide the learning of a student network. We show that the seemingly different self-supervision task can serve as a simple yet powerful solution. By exploiting the similarity between those self-supervision signals as an auxiliary task, one can effectively transfer the hidden information from the teacher to the student.
arXiv Detail & Related papers (2020-06-12T12:18:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.