Related papers: Knowledge Distillation from Single to Multi Labels: an Empirical Study

Knowledge Distillation from Single to Multi Labels: an Empirical Study

URL: http://arxiv.org/abs/2303.08360v1
Date: Wed, 15 Mar 2023 04:39:01 GMT
Title: Knowledge Distillation from Single to Multi Labels: an Empirical Study
Authors: Youcai Zhang, Yuzhuo Qin, Hengwei Liu, Yanhao Zhang, Yaqian Li, Xiaodong Gu
Abstract summary: We introduce a novel distillation method based on Class Activation Maps (CAMs) Our findings indicate that the logit-based method is not well-suited for multi-label classification. We propose that a suitable dark knowledge should incorporate class-wise information and be highly correlated with the final classification results.
Score: 14.12487391004319
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Knowledge distillation (KD) has been extensively studied in single-label image classification. However, its efficacy for multi-label classification remains relatively unexplored. In this study, we firstly investigate the effectiveness of classical KD techniques, including logit-based and feature-based methods, for multi-label classification. Our findings indicate that the logit-based method is not well-suited for multi-label classification, as the teacher fails to provide inter-category similarity information or regularization effect on student model's training. Moreover, we observe that feature-based methods struggle to convey compact information of multiple labels simultaneously. Given these limitations, we propose that a suitable dark knowledge should incorporate class-wise information and be highly correlated with the final classification results. To address these issues, we introduce a novel distillation method based on Class Activation Maps (CAMs), which is both effective and straightforward to implement. Across a wide range of settings, CAMs-based distillation consistently outperforms other methods.

Related papers

Collaborative Feature-Logits Contrastive Learning for Open-Set Semi-Supervised Object Detection [75.02249869573994]
In open-set scenarios, the unlabeled dataset contains both in-distribution (ID) classes and out-of-distribution (OOD) classes. Applying semi-supervised detectors in such settings can lead to misclassifying OOD class as ID classes. We propose a simple yet effective method, termed Collaborative Feature-Logits Detector (CFL-Detector)
arXiv Detail & Related papers (2024-11-20T02:57:35Z)
Multi-Label Knowledge Distillation [86.03990467785312]
We propose a novel multi-label knowledge distillation method. On one hand, it exploits the informative semantic knowledge from the logits by dividing the multi-label learning problem into a set of binary classification problems. On the other hand, it enhances the distinctiveness of the learned feature representations by leveraging the structural information of label-wise embeddings.
arXiv Detail & Related papers (2023-08-12T03:19:08Z)
Class-Distribution-Aware Pseudo Labeling for Semi-Supervised Multi-Label Learning [97.88458953075205]
Pseudo-labeling has emerged as a popular and effective approach for utilizing unlabeled data. This paper proposes a novel solution called Class-Aware Pseudo-Labeling (CAP) that performs pseudo-labeling in a class-aware manner.
arXiv Detail & Related papers (2023-05-04T12:52:18Z)
FLAG: Fast Label-Adaptive Aggregation for Multi-label Classification in Federated Learning [1.4280238304844592]
This study proposes a new multi-label federated learning framework with a Clustering-based Multi-label Data Allocation (CMDA) and a novel aggregation method, Fast Label-Adaptive Aggregation (FLAG) The experimental results demonstrate that our methods only need less than 50% of training epochs and communication rounds to surpass the performance of state-of-the-art federated learning methods.
arXiv Detail & Related papers (2023-02-27T08:16:39Z)
Class-aware Information for Logit-based Knowledge Distillation [16.634819319915923]
We propose a Class-aware Logit Knowledge Distillation (CLKD) method, that extents the logit distillation in both instance-level and class-level. CLKD enables the student model mimic higher semantic information from the teacher model, hence improving the distillation performance.
arXiv Detail & Related papers (2022-11-27T09:27:50Z)
Multi-Faceted Distillation of Base-Novel Commonality for Few-shot Object Detection [58.48995335728938]
We learn three types of class-agnostic commonalities between base and novel classes explicitly. Our method can be readily integrated into most of existing fine-tuning based methods and consistently improve the performance by a large margin.
arXiv Detail & Related papers (2022-07-22T16:46:51Z)
Self-Training: A Survey [5.772546394254112]
Semi-supervised algorithms aim to learn prediction functions from a small set of labeled observations and a large set of unlabeled observations. Among the existing techniques, self-training methods have undoubtedly attracted greater attention in recent years. We present self-training methods for binary and multi-class classification; as well as their variants and two related approaches.
arXiv Detail & Related papers (2022-02-24T11:40:44Z)
Multi-head Knowledge Distillation for Model Compression [65.58705111863814]
We propose a simple-to-implement method using auxiliary classifiers at intermediate layers for matching features. We show that the proposed method outperforms prior relevant approaches presented in the literature.
arXiv Detail & Related papers (2020-12-05T00:49:14Z)
Cooperative Bi-path Metric for Few-shot Learning [50.98891758059389]
We make two contributions to investigate the few-shot classification problem. We report a simple and effective baseline trained on base classes in the way of traditional supervised learning. We propose a cooperative bi-path metric for classification, which leverages the correlations between base classes and novel classes to further improve the accuracy.
arXiv Detail & Related papers (2020-08-10T11:28:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.