SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised
Learning
- URL: http://arxiv.org/abs/2301.10921v1
- Date: Thu, 26 Jan 2023 03:53:25 GMT
- Title: SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised
Learning
- Authors: Hao Chen, Ran Tao, Yue Fan, Yidong Wang, Jindong Wang, Bernt Schiele,
Xing Xie, Bhiksha Raj, Marios Savvides
- Abstract summary: This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation.
We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training.
In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
- Score: 101.86916775218403
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The critical challenge of Semi-Supervised Learning (SSL) is how to
effectively leverage the limited labeled data and massive unlabeled data to
improve the model's generalization performance. In this paper, we first revisit
the popular pseudo-labeling methods via a unified sample weighting formulation
and demonstrate the inherent quantity-quality trade-off problem of
pseudo-labeling with thresholding, which may prohibit learning. To this end, we
propose SoftMatch to overcome the trade-off by maintaining both high quantity
and high quality of pseudo-labels during training, effectively exploiting the
unlabeled data. We derive a truncated Gaussian function to weight samples based
on their confidence, which can be viewed as a soft version of the confidence
threshold. We further enhance the utilization of weakly-learned classes by
proposing a uniform alignment approach. In experiments, SoftMatch shows
substantial improvements across a wide variety of benchmarks, including image,
text, and imbalanced classification.
Related papers
- Dual-Decoupling Learning and Metric-Adaptive Thresholding for Semi-Supervised Multi-Label Learning [81.83013974171364]
Semi-supervised multi-label learning (SSMLL) is a powerful framework for leveraging unlabeled data to reduce the expensive cost of collecting precise multi-label annotations.
Unlike semi-supervised learning, one cannot select the most probable label as the pseudo-label in SSMLL due to multiple semantics contained in an instance.
We propose a dual-perspective method to generate high-quality pseudo-labels.
arXiv Detail & Related papers (2024-07-26T09:33:53Z) - Self Adaptive Threshold Pseudo-labeling and Unreliable Sample Contrastive Loss for Semi-supervised Image Classification [6.920336485308536]
Pseudo-labeling-based semi-supervised approaches suffer from two problems in image classification.
We develop a self adaptive threshold pseudo-labeling strategy, which thresholds for each class can be dynamically adjusted to increase the number of reliable samples.
In order to effectively utilise unlabeled data with confidence below the thresholds, we propose an unreliable sample contrastive loss.
arXiv Detail & Related papers (2024-07-04T03:04:56Z) - Boosting Semi-Supervised Learning by bridging high and low-confidence
predictions [4.18804572788063]
Pseudo-labeling is a crucial technique in semi-supervised learning (SSL)
We propose a new method called ReFixMatch, which aims to utilize all of the unlabeled data during training.
arXiv Detail & Related papers (2023-08-15T00:27:18Z) - Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and
Uncurated Unlabeled Data [70.25049762295193]
We introduce a novel conditional image generation framework that accepts noisy-labeled and uncurated data during training.
We propose soft curriculum learning, which assigns instance-wise weights for adversarial training while assigning new labels for unlabeled data.
Our experiments show that our approach outperforms existing semi-supervised and label-noise robust methods in terms of both quantitative and qualitative performance.
arXiv Detail & Related papers (2023-07-17T08:31:59Z) - InPL: Pseudo-labeling the Inliers First for Imbalanced Semi-supervised
Learning [34.062061310242385]
We present a new perspective of pseudo-labeling for imbalanced semi-supervised learning (SSL)
We measure whether an unlabeled sample is likely to be in-distribution'' or out-of-distribution''
Experiments demonstrate that our energy-based pseudo-labeling method, textbfInPL, significantly outperforms confidence-based methods on imbalanced SSL benchmarks.
arXiv Detail & Related papers (2023-03-13T16:45:41Z) - SLaM: Student-Label Mixing for Distillation with Unlabeled Examples [15.825078347452024]
We present a principled method for knowledge distillation with unlabeled examples that we call Student-Label Mixing (SLaM)
SLaM consistently improves over prior approaches by evaluating it on several standard benchmarks.
We give an algorithm improving the best-known sample complexity for learning halfspaces with margin under random classification noise.
arXiv Detail & Related papers (2023-02-08T00:14:44Z) - PercentMatch: Percentile-based Dynamic Thresholding for Multi-Label
Semi-Supervised Classification [64.39761523935613]
We propose a percentile-based threshold adjusting scheme to dynamically alter the score thresholds of positive and negative pseudo-labels for each class during the training.
We achieve strong performance on Pascal VOC2007 and MS-COCO datasets when compared to recent SSL methods.
arXiv Detail & Related papers (2022-08-30T01:27:48Z) - Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models.
Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z) - In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label
Selection Framework for Semi-Supervised Learning [53.1047775185362]
Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its original formulation.
We argue that PL underperforms due to the erroneous high confidence predictions from poorly calibrated models.
We propose an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process.
arXiv Detail & Related papers (2021-01-15T23:29:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.