An analysis of over-sampling labeled data in semi-supervised learning
with FixMatch
- URL: http://arxiv.org/abs/2201.00604v1
- Date: Mon, 3 Jan 2022 12:22:26 GMT
- Title: An analysis of over-sampling labeled data in semi-supervised learning
with FixMatch
- Authors: Miquel Mart\'i i Rabad\'an, Sebastian Bujwid, Alessandro Pieropan,
Hossein Azizpour, Atsuto Maki
- Abstract summary: Most semi-supervised learning methods over-sample labeled data when constructing training mini-batches.
This paper studies whether this common practice improves learning and how.
We compare it to an alternative setting where each mini-batch is uniformly sampled from all the training data, labeled or not.
- Score: 66.34968300128631
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most semi-supervised learning methods over-sample labeled data when
constructing training mini-batches. This paper studies whether this common
practice improves learning and how. We compare it to an alternative setting
where each mini-batch is uniformly sampled from all the training data, labeled
or not, which greatly reduces direct supervision from true labels in typical
low-label regimes. However, this simpler setting can also be seen as more
general and even necessary in multi-task problems where over-sampling labeled
data would become intractable. Our experiments on semi-supervised CIFAR-10
image classification using FixMatch show a performance drop when using the
uniform sampling approach which diminishes when the amount of labeled data or
the training time increases. Further, we analyse the training dynamics to
understand how over-sampling of labeled data compares to uniform sampling. Our
main finding is that over-sampling is especially beneficial early in training
but gets less important in the later stages when more pseudo-labels become
correct. Nevertheless, we also find that keeping some true labels remains
important to avoid the accumulation of confirmation errors from incorrect
pseudo-labels.
Related papers
- Learning with Instance-Dependent Noisy Labels by Anchor Hallucination and Hard Sample Label Correction [12.317154103998433]
Traditional Noisy-Label Learning (NLL) methods categorize training data into clean and noisy sets based on the loss distribution of training samples.
Our approach explicitly distinguishes between clean vs.noisy and easy vs. hard samples.
Corrected hard samples, along with the easy samples, are used as labeled data in subsequent semi-supervised training.
arXiv Detail & Related papers (2024-07-10T03:00:14Z) - Multi-Label Adaptive Batch Selection by Highlighting Hard and Imbalanced Samples [9.360376286221943]
We introduce an adaptive batch selection algorithm tailored to multi-label deep learning models.
Our method converges faster and performs better than random batch selection.
arXiv Detail & Related papers (2024-03-27T02:00:18Z) - Virtual Category Learning: A Semi-Supervised Learning Method for Dense
Prediction with Extremely Limited Labels [63.16824565919966]
This paper proposes to use confusing samples proactively without label correction.
A Virtual Category (VC) is assigned to each confusing sample in such a way that it can safely contribute to the model optimisation.
Our intriguing findings highlight the usage of VC learning in dense vision tasks.
arXiv Detail & Related papers (2023-12-02T16:23:52Z) - One-bit Supervision for Image Classification: Problem, Solution, and
Beyond [114.95815360508395]
This paper presents one-bit supervision, a novel setting of learning with fewer labels, for image classification.
We propose a multi-stage training paradigm and incorporate negative label suppression into an off-the-shelf semi-supervised learning algorithm.
In multiple benchmarks, the learning efficiency of the proposed approach surpasses that using full-bit, semi-supervised supervision.
arXiv Detail & Related papers (2023-11-26T07:39:00Z) - Something for (almost) nothing: Improving deep ensemble calibration
using unlabeled data [4.503508912578133]
We present a method to improve the calibration of deep ensembles in the small training data regime in the presence of unlabeled data.
Our approach is extremely simple to implement: given an unlabeled set, for each unlabeled data point, we simply fit a different randomly selected label with each ensemble member.
arXiv Detail & Related papers (2023-10-04T15:21:54Z) - Late Stopping: Avoiding Confidently Learning from Mislabeled Examples [61.00103151680946]
We propose a new framework, Late Stopping, which leverages the intrinsic robust learning ability of DNNs through a prolonged training process.
We empirically observe that mislabeled and clean examples exhibit differences in the number of epochs required for them to be consistently and correctly classified.
Experimental results on benchmark-simulated and real-world noisy datasets demonstrate that the proposed method outperforms state-of-the-art counterparts.
arXiv Detail & Related papers (2023-08-26T12:43:25Z) - Rethinking Re-Sampling in Imbalanced Semi-Supervised Learning [26.069534478556527]
Semi-Supervised Learning (SSL) has shown its strong ability in utilizing unlabeled data when labeled data is scarce.
Most SSL algorithms work under the assumption that the class distributions are balanced in both training and test sets.
In this work, we consider the problem of SSL on class-imbalanced data, which better reflects real-world situations.
arXiv Detail & Related papers (2021-06-01T03:58:18Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z) - One-bit Supervision for Image Classification [121.87598671087494]
One-bit supervision is a novel setting of learning from incomplete annotations.
We propose a multi-stage training paradigm which incorporates negative label suppression into an off-the-shelf semi-supervised learning algorithm.
arXiv Detail & Related papers (2020-09-14T03:06:23Z) - Rethinking Curriculum Learning with Incremental Labels and Adaptive
Compensation [35.593312267921256]
Like humans, deep networks have been shown to learn better when samples are organized and introduced in a meaningful order or curriculum.
We propose Learning with Incremental Labels and Adaptive Compensation (LILAC), a two-phase method that incrementally increases the number of unique output labels.
arXiv Detail & Related papers (2020-01-13T21:00:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.