Knowledge Distillation from Single to Multi Labels: an Empirical Study
- URL: http://arxiv.org/abs/2303.08360v1
- Date: Wed, 15 Mar 2023 04:39:01 GMT
- Title: Knowledge Distillation from Single to Multi Labels: an Empirical Study
- Authors: Youcai Zhang, Yuzhuo Qin, Hengwei Liu, Yanhao Zhang, Yaqian Li,
Xiaodong Gu
- Abstract summary: We introduce a novel distillation method based on Class Activation Maps (CAMs)
Our findings indicate that the logit-based method is not well-suited for multi-label classification.
We propose that a suitable dark knowledge should incorporate class-wise information and be highly correlated with the final classification results.
- Score: 14.12487391004319
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge distillation (KD) has been extensively studied in single-label
image classification. However, its efficacy for multi-label classification
remains relatively unexplored. In this study, we firstly investigate the
effectiveness of classical KD techniques, including logit-based and
feature-based methods, for multi-label classification. Our findings indicate
that the logit-based method is not well-suited for multi-label classification,
as the teacher fails to provide inter-category similarity information or
regularization effect on student model's training. Moreover, we observe that
feature-based methods struggle to convey compact information of multiple labels
simultaneously. Given these limitations, we propose that a suitable dark
knowledge should incorporate class-wise information and be highly correlated
with the final classification results. To address these issues, we introduce a
novel distillation method based on Class Activation Maps (CAMs), which is both
effective and straightforward to implement. Across a wide range of settings,
CAMs-based distillation consistently outperforms other methods.
Related papers
- Collaborative Feature-Logits Contrastive Learning for Open-Set Semi-Supervised Object Detection [75.02249869573994]
In open-set scenarios, the unlabeled dataset contains both in-distribution (ID) classes and out-of-distribution (OOD) classes.
Applying semi-supervised detectors in such settings can lead to misclassifying OOD class as ID classes.
We propose a simple yet effective method, termed Collaborative Feature-Logits Detector (CFL-Detector)
arXiv Detail & Related papers (2024-11-20T02:57:35Z) - Multi-Label Knowledge Distillation [86.03990467785312]
We propose a novel multi-label knowledge distillation method.
On one hand, it exploits the informative semantic knowledge from the logits by dividing the multi-label learning problem into a set of binary classification problems.
On the other hand, it enhances the distinctiveness of the learned feature representations by leveraging the structural information of label-wise embeddings.
arXiv Detail & Related papers (2023-08-12T03:19:08Z) - Class-Distribution-Aware Pseudo Labeling for Semi-Supervised Multi-Label
Learning [97.88458953075205]
Pseudo-labeling has emerged as a popular and effective approach for utilizing unlabeled data.
This paper proposes a novel solution called Class-Aware Pseudo-Labeling (CAP) that performs pseudo-labeling in a class-aware manner.
arXiv Detail & Related papers (2023-05-04T12:52:18Z) - FLAG: Fast Label-Adaptive Aggregation for Multi-label Classification in
Federated Learning [1.4280238304844592]
This study proposes a new multi-label federated learning framework with a Clustering-based Multi-label Data Allocation (CMDA) and a novel aggregation method, Fast Label-Adaptive Aggregation (FLAG)
The experimental results demonstrate that our methods only need less than 50% of training epochs and communication rounds to surpass the performance of state-of-the-art federated learning methods.
arXiv Detail & Related papers (2023-02-27T08:16:39Z) - Class-aware Information for Logit-based Knowledge Distillation [16.634819319915923]
We propose a Class-aware Logit Knowledge Distillation (CLKD) method, that extents the logit distillation in both instance-level and class-level.
CLKD enables the student model mimic higher semantic information from the teacher model, hence improving the distillation performance.
arXiv Detail & Related papers (2022-11-27T09:27:50Z) - Multi-Faceted Distillation of Base-Novel Commonality for Few-shot Object
Detection [58.48995335728938]
We learn three types of class-agnostic commonalities between base and novel classes explicitly.
Our method can be readily integrated into most of existing fine-tuning based methods and consistently improve the performance by a large margin.
arXiv Detail & Related papers (2022-07-22T16:46:51Z) - Self-Training: A Survey [5.772546394254112]
Semi-supervised algorithms aim to learn prediction functions from a small set of labeled observations and a large set of unlabeled observations.
Among the existing techniques, self-training methods have undoubtedly attracted greater attention in recent years.
We present self-training methods for binary and multi-class classification; as well as their variants and two related approaches.
arXiv Detail & Related papers (2022-02-24T11:40:44Z) - Multi-head Knowledge Distillation for Model Compression [65.58705111863814]
We propose a simple-to-implement method using auxiliary classifiers at intermediate layers for matching features.
We show that the proposed method outperforms prior relevant approaches presented in the literature.
arXiv Detail & Related papers (2020-12-05T00:49:14Z) - Cooperative Bi-path Metric for Few-shot Learning [50.98891758059389]
We make two contributions to investigate the few-shot classification problem.
We report a simple and effective baseline trained on base classes in the way of traditional supervised learning.
We propose a cooperative bi-path metric for classification, which leverages the correlations between base classes and novel classes to further improve the accuracy.
arXiv Detail & Related papers (2020-08-10T11:28:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.