Understanding Self-Distillation and Partial Label Learning in
Multi-Class Classification with Label Noise
- URL: http://arxiv.org/abs/2402.10482v1
- Date: Fri, 16 Feb 2024 07:13:12 GMT
- Title: Understanding Self-Distillation and Partial Label Learning in
Multi-Class Classification with Label Noise
- Authors: Hyeonsu Jeong and Hye Won Chung
- Abstract summary: Self-distillation (SD) is the process of training a student model using the outputs of a teacher model.
Our study theoretically examines SD in multi-class classification with cross-entropy loss.
- Score: 12.636657455986144
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-distillation (SD) is the process of training a student model using the
outputs of a teacher model, with both models sharing the same architecture. Our
study theoretically examines SD in multi-class classification with
cross-entropy loss, exploring both multi-round SD and SD with refined teacher
outputs, inspired by partial label learning (PLL). By deriving a closed-form
solution for the student model's outputs, we discover that SD essentially
functions as label averaging among instances with high feature correlations.
Initially beneficial, this averaging helps the model focus on feature clusters
correlated with a given instance for predicting the label. However, it leads to
diminishing performance with increasing distillation rounds. Additionally, we
demonstrate SD's effectiveness in label noise scenarios and identify the label
corruption condition and minimum number of distillation rounds needed to
achieve 100% classification accuracy. Our study also reveals that one-step
distillation with refined teacher outputs surpasses the efficacy of multi-step
SD using the teacher's direct output in high noise rate regimes.
Related papers
- Mitigating Instance-Dependent Label Noise: Integrating Self-Supervised Pretraining with Pseudo-Label Refinement [3.272177633069322]
Real-world datasets often contain noisy labels due to human error, ambiguity, or resource constraints during the annotation process.
We propose a novel framework that combines self-supervised learning using SimCLR with iterative pseudo-label refinement.
Our approach significantly outperforms several state-of-the-art methods, particularly under high noise conditions.
arXiv Detail & Related papers (2024-12-06T09:56:49Z) - Dual-Decoupling Learning and Metric-Adaptive Thresholding for Semi-Supervised Multi-Label Learning [81.83013974171364]
Semi-supervised multi-label learning (SSMLL) is a powerful framework for leveraging unlabeled data to reduce the expensive cost of collecting precise multi-label annotations.
Unlike semi-supervised learning, one cannot select the most probable label as the pseudo-label in SSMLL due to multiple semantics contained in an instance.
We propose a dual-perspective method to generate high-quality pseudo-labels.
arXiv Detail & Related papers (2024-07-26T09:33:53Z) - Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - SLaM: Student-Label Mixing for Distillation with Unlabeled Examples [15.825078347452024]
We present a principled method for knowledge distillation with unlabeled examples that we call Student-Label Mixing (SLaM)
SLaM consistently improves over prior approaches by evaluating it on several standard benchmarks.
We give an algorithm improving the best-known sample complexity for learning halfspaces with margin under random classification noise.
arXiv Detail & Related papers (2023-02-08T00:14:44Z) - Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly
Supervised Video Anomaly Detection [149.23913018423022]
Weakly supervised video anomaly detection aims to identify abnormal events in videos using only video-level labels.
Two-stage self-training methods have achieved significant improvements by self-generating pseudo labels.
We propose an enhancement framework by exploiting completeness and uncertainty properties for effective self-training.
arXiv Detail & Related papers (2022-12-08T05:53:53Z) - Leveraging Instance Features for Label Aggregation in Programmatic Weak
Supervision [75.1860418333995]
Programmatic Weak Supervision (PWS) has emerged as a widespread paradigm to synthesize training labels efficiently.
The core component of PWS is the label model, which infers true labels by aggregating the outputs of multiple noisy supervision sources as labeling functions.
Existing statistical label models typically rely only on the outputs of LF, ignoring the instance features when modeling the underlying generative process.
arXiv Detail & Related papers (2022-10-06T07:28:53Z) - Efficient and Flexible Sublabel-Accurate Energy Minimization [62.50191141358778]
We address the problem of minimizing a class of energy functions consisting of data and smoothness terms.
Existing continuous optimization methods can find sublabel-accurate solutions, but they are not efficient for large label spaces.
We propose an efficient sublabel-accurate method that utilizes the best properties of both continuous and discrete models.
arXiv Detail & Related papers (2022-06-20T06:58:55Z) - Optimizing Diffusion Rate and Label Reliability in a Graph-Based
Semi-supervised Classifier [2.4366811507669124]
The Local and Global Consistency (LGC) algorithm is one of the most well-known graph-based semi-supervised (GSSL) classifiers.
We discuss how removing the self-influence of a labeled instance may be beneficial, and how it relates to leave-one-out error.
Within this framework, we propose methods to estimate label reliability and diffusion rate.
arXiv Detail & Related papers (2022-01-10T16:58:52Z) - Deep Semi-supervised Knowledge Distillation for Overlapping Cervical
Cell Instance Segmentation [54.49894381464853]
We propose to leverage both labeled and unlabeled data for instance segmentation with improved accuracy by knowledge distillation.
We propose a novel Mask-guided Mean Teacher framework with Perturbation-sensitive Sample Mining.
Experiments show that the proposed method improves the performance significantly compared with the supervised method learned from labeled data only.
arXiv Detail & Related papers (2020-07-21T13:27:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.