Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised
Learning
- URL: http://arxiv.org/abs/2001.06001v2
- Date: Thu, 10 Dec 2020 16:14:35 GMT
- Title: Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised
Learning
- Authors: Paola Cascante-Bonilla, Fuwen Tan, Yanjun Qi, Vicente Ordonez
- Abstract summary: We revisit the idea of pseudo-labeling in the context of semi-supervised learning.
Pseudo-labeling works by applying pseudo-labels to samples in the unlabeled set.
We obtain 94.91% accuracy on CIFAR-10 using only 4,000 labeled samples, and 68.87% top-1 accuracy on Imagenet-ILSVRC using only 10% of the labeled samples.
- Score: 27.258077365554474
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper we revisit the idea of pseudo-labeling in the context of
semi-supervised learning where a learning algorithm has access to a small set
of labeled samples and a large set of unlabeled samples. Pseudo-labeling works
by applying pseudo-labels to samples in the unlabeled set by using a model
trained on the combination of the labeled samples and any previously
pseudo-labeled samples, and iteratively repeating this process in a
self-training cycle. Current methods seem to have abandoned this approach in
favor of consistency regularization methods that train models under a
combination of different styles of self-supervised losses on the unlabeled
samples and standard supervised losses on the labeled samples. We empirically
demonstrate that pseudo-labeling can in fact be competitive with the
state-of-the-art, while being more resilient to out-of-distribution samples in
the unlabeled set. We identify two key factors that allow pseudo-labeling to
achieve such remarkable results (1) applying curriculum learning principles and
(2) avoiding concept drift by restarting model parameters before each
self-training cycle. We obtain 94.91% accuracy on CIFAR-10 using only 4,000
labeled samples, and 68.87% top-1 accuracy on Imagenet-ILSVRC using only 10% of
the labeled samples. The code is available at
https://github.com/uvavision/Curriculum-Labeling
Related papers
- Pairwise Similarity Distribution Clustering for Noisy Label Learning [0.0]
Noisy label learning aims to train deep neural networks using a large amount of samples with noisy labels.
We propose a simple yet effective sample selection algorithm to divide the training samples into one clean set and another noisy set.
Experimental results on various benchmark datasets, such as CIFAR-10, CIFAR-100 and Clothing1M, demonstrate significant improvements over state-of-the-art methods.
arXiv Detail & Related papers (2024-04-02T11:30:22Z) - Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning [59.44422468242455]
We propose a novel method dubbed ShrinkMatch to learn uncertain samples.
For each uncertain sample, it adaptively seeks a shrunk class space, which merely contains the original top-1 class.
We then impose a consistency regularization between a pair of strongly and weakly augmented samples in the shrunk space to strive for discriminative representations.
arXiv Detail & Related papers (2023-08-13T14:05:24Z) - Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and
Uncurated Unlabeled Data [70.25049762295193]
We introduce a novel conditional image generation framework that accepts noisy-labeled and uncurated data during training.
We propose soft curriculum learning, which assigns instance-wise weights for adversarial training while assigning new labels for unlabeled data.
Our experiments show that our approach outperforms existing semi-supervised and label-noise robust methods in terms of both quantitative and qualitative performance.
arXiv Detail & Related papers (2023-07-17T08:31:59Z) - Class-Distribution-Aware Pseudo Labeling for Semi-Supervised Multi-Label
Learning [97.88458953075205]
Pseudo-labeling has emerged as a popular and effective approach for utilizing unlabeled data.
This paper proposes a novel solution called Class-Aware Pseudo-Labeling (CAP) that performs pseudo-labeling in a class-aware manner.
arXiv Detail & Related papers (2023-05-04T12:52:18Z) - Dist-PU: Positive-Unlabeled Learning from a Label Distribution
Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper.
Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions.
Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z) - Impact of Strategic Sampling and Supervision Policies on Semi-supervised Learning [23.4909421082857]
In semi-supervised representation learning frameworks, when the number of labelled data is very scarce, the quality and representativeness of these samples become increasingly important.
Existing literature on semi-supervised learning randomly sample a limited number of data points for labelling.
All these labelled samples are then used along with the unlabelled data throughout the training process.
arXiv Detail & Related papers (2022-11-27T18:29:54Z) - Pseudo-Label Noise Suppression Techniques for Semi-Supervised Semantic
Segmentation [21.163070161951868]
Semi-consuming learning (SSL) can reduce the need for large labelled datasets by incorporating unsupervised data into the training.
Current SSL approaches use an initially supervised trained model to generate predictions for unlabelled images, called pseudo-labels.
We use three mechanisms to control pseudo-label noise and errors.
arXiv Detail & Related papers (2022-10-19T09:46:27Z) - An analysis of over-sampling labeled data in semi-supervised learning
with FixMatch [66.34968300128631]
Most semi-supervised learning methods over-sample labeled data when constructing training mini-batches.
This paper studies whether this common practice improves learning and how.
We compare it to an alternative setting where each mini-batch is uniformly sampled from all the training data, labeled or not.
arXiv Detail & Related papers (2022-01-03T12:22:26Z) - Sample Prior Guided Robust Model Learning to Suppress Noisy Labels [8.119439844514973]
We propose PGDF, a novel framework to learn a deep model to suppress noise by generating the samples' prior knowledge.
Our framework can save more informative hard clean samples into the cleanly labeled set.
We evaluate our method using synthetic datasets based on CIFAR-10 and CIFAR-100, as well as on the real-world datasets WebVision and Clothing1M.
arXiv Detail & Related papers (2021-12-02T13:09:12Z) - S3: Supervised Self-supervised Learning under Label Noise [53.02249460567745]
In this paper we address the problem of classification in the presence of label noise.
In the heart of our method is a sample selection mechanism that relies on the consistency between the annotated label of a sample and the distribution of the labels in its neighborhood in the feature space.
Our method significantly surpasses previous methods on both CIFARCIFAR100 with artificial noise and real-world noisy datasets such as WebVision and ANIMAL-10N.
arXiv Detail & Related papers (2021-11-22T15:49:20Z) - Rethinking Curriculum Learning with Incremental Labels and Adaptive
Compensation [35.593312267921256]
Like humans, deep networks have been shown to learn better when samples are organized and introduced in a meaningful order or curriculum.
We propose Learning with Incremental Labels and Adaptive Compensation (LILAC), a two-phase method that incrementally increases the number of unique output labels.
arXiv Detail & Related papers (2020-01-13T21:00:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.