Related papers: An analysis of over-sampling labeled data in semi-supervised learning with FixMatch

An analysis of over-sampling labeled data in semi-supervised learning with FixMatch

URL: http://arxiv.org/abs/2201.00604v1
Date: Mon, 3 Jan 2022 12:22:26 GMT
Title: An analysis of over-sampling labeled data in semi-supervised learning with FixMatch
Authors: Miquel Mart\'i i Rabad\'an, Sebastian Bujwid, Alessandro Pieropan, Hossein Azizpour, Atsuto Maki
Abstract summary: Most semi-supervised learning methods over-sample labeled data when constructing training mini-batches. This paper studies whether this common practice improves learning and how. We compare it to an alternative setting where each mini-batch is uniformly sampled from all the training data, labeled or not.
Score: 66.34968300128631
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Most semi-supervised learning methods over-sample labeled data when constructing training mini-batches. This paper studies whether this common practice improves learning and how. We compare it to an alternative setting where each mini-batch is uniformly sampled from all the training data, labeled or not, which greatly reduces direct supervision from true labels in typical low-label regimes. However, this simpler setting can also be seen as more general and even necessary in multi-task problems where over-sampling labeled data would become intractable. Our experiments on semi-supervised CIFAR-10 image classification using FixMatch show a performance drop when using the uniform sampling approach which diminishes when the amount of labeled data or the training time increases. Further, we analyse the training dynamics to understand how over-sampling of labeled data compares to uniform sampling. Our main finding is that over-sampling is especially beneficial early in training but gets less important in the later stages when more pseudo-labels become correct. Nevertheless, we also find that keeping some true labels remains important to avoid the accumulation of confirmation errors from incorrect pseudo-labels.

Related papers

Enhanced Sample Selection with Confidence Tracking: Identifying Correctly Labeled yet Hard-to-Learn Samples in Noisy Data [18.111971239860836]
We propose a novel sample selection method for image classification in the presence of noisy labels. Our goal is to accurately distinguish correctly labeled yet hard-to-learn samples from mislabeled ones. Our method functions as a plug-and-play component that can be seamlessly integrated into existing sample selection techniques.
arXiv Detail & Related papers (2025-04-24T12:07:14Z)
Learning with Instance-Dependent Noisy Labels by Anchor Hallucination and Hard Sample Label Correction [12.317154103998433]
Traditional Noisy-Label Learning (NLL) methods categorize training data into clean and noisy sets based on the loss distribution of training samples. Our approach explicitly distinguishes between clean vs.noisy and easy vs. hard samples. Corrected hard samples, along with the easy samples, are used as labeled data in subsequent semi-supervised training.
arXiv Detail & Related papers (2024-07-10T03:00:14Z)
Multi-Label Adaptive Batch Selection by Highlighting Hard and Imbalanced Samples [9.360376286221943]
We introduce an adaptive batch selection algorithm tailored to multi-label deep learning models. Our method converges faster and performs better than random batch selection.
arXiv Detail & Related papers (2024-03-27T02:00:18Z)
Virtual Category Learning: A Semi-Supervised Learning Method for Dense Prediction with Extremely Limited Labels [63.16824565919966]
This paper proposes to use confusing samples proactively without label correction. A Virtual Category (VC) is assigned to each confusing sample in such a way that it can safely contribute to the model optimisation. Our intriguing findings highlight the usage of VC learning in dense vision tasks.
arXiv Detail & Related papers (2023-12-02T16:23:52Z)
One-bit Supervision for Image Classification: Problem, Solution, and Beyond [114.95815360508395]
This paper presents one-bit supervision, a novel setting of learning with fewer labels, for image classification. We propose a multi-stage training paradigm and incorporate negative label suppression into an off-the-shelf semi-supervised learning algorithm. In multiple benchmarks, the learning efficiency of the proposed approach surpasses that using full-bit, semi-supervised supervision.
arXiv Detail & Related papers (2023-11-26T07:39:00Z)
Something for (almost) nothing: Improving deep ensemble calibration using unlabeled data [4.503508912578133]
We present a method to improve the calibration of deep ensembles in the small training data regime in the presence of unlabeled data. Our approach is extremely simple to implement: given an unlabeled set, for each unlabeled data point, we simply fit a different randomly selected label with each ensemble member.
arXiv Detail & Related papers (2023-10-04T15:21:54Z)
Late Stopping: Avoiding Confidently Learning from Mislabeled Examples [61.00103151680946]
We propose a new framework, Late Stopping, which leverages the intrinsic robust learning ability of DNNs through a prolonged training process. We empirically observe that mislabeled and clean examples exhibit differences in the number of epochs required for them to be consistently and correctly classified. Experimental results on benchmark-simulated and real-world noisy datasets demonstrate that the proposed method outperforms state-of-the-art counterparts.
arXiv Detail & Related papers (2023-08-26T12:43:25Z)
Rethinking Re-Sampling in Imbalanced Semi-Supervised Learning [26.069534478556527]
Semi-Supervised Learning (SSL) has shown its strong ability in utilizing unlabeled data when labeled data is scarce. Most SSL algorithms work under the assumption that the class distributions are balanced in both training and test sets. In this work, we consider the problem of SSL on class-imbalanced data, which better reflects real-world situations.
arXiv Detail & Related papers (2021-06-01T03:58:18Z)
Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels. We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z)
One-bit Supervision for Image Classification [121.87598671087494]
One-bit supervision is a novel setting of learning from incomplete annotations. We propose a multi-stage training paradigm which incorporates negative label suppression into an off-the-shelf semi-supervised learning algorithm.
arXiv Detail & Related papers (2020-09-14T03:06:23Z)
Rethinking Curriculum Learning with Incremental Labels and Adaptive Compensation [35.593312267921256]
Like humans, deep networks have been shown to learn better when samples are organized and introduced in a meaningful order or curriculum. We propose Learning with Incremental Labels and Adaptive Compensation (LILAC), a two-phase method that incrementally increases the number of unique output labels.
arXiv Detail & Related papers (2020-01-13T21:00:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.