Boosting Semi-Supervised Learning by Exploiting All Unlabeled Data
- URL: http://arxiv.org/abs/2303.11066v1
- Date: Mon, 20 Mar 2023 12:44:11 GMT
- Title: Boosting Semi-Supervised Learning by Exploiting All Unlabeled Data
- Authors: Yuhao Chen, Xin Tan, Borui Zhao, Zhaowei Chen, Renjie Song, Jiajun
Liang, Xuequan Lu
- Abstract summary: Semi-supervised learning (SSL) has attracted enormous attention due to its vast potential of mitigating the dependence on large labeled datasets.
We propose two novel techniques: Entropy Meaning Loss (EML) and Adaptive Negative Learning (ANL)
We integrate these techniques with FixMatch, and develop a simple yet powerful framework called FullMatch.
- Score: 21.6350640726058
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semi-supervised learning (SSL) has attracted enormous attention due to its
vast potential of mitigating the dependence on large labeled datasets. The
latest methods (e.g., FixMatch) use a combination of consistency regularization
and pseudo-labeling to achieve remarkable successes. However, these methods all
suffer from the waste of complicated examples since all pseudo-labels have to
be selected by a high threshold to filter out noisy ones. Hence, the examples
with ambiguous predictions will not contribute to the training phase. For
better leveraging all unlabeled examples, we propose two novel techniques:
Entropy Meaning Loss (EML) and Adaptive Negative Learning (ANL). EML
incorporates the prediction distribution of non-target classes into the
optimization objective to avoid competition with target class, and thus
generating more high-confidence predictions for selecting pseudo-label. ANL
introduces the additional negative pseudo-label for all unlabeled data to
leverage low-confidence examples. It adaptively allocates this label by
dynamically evaluating the top-k performance of the model. EML and ANL do not
introduce any additional parameter and hyperparameter. We integrate these
techniques with FixMatch, and develop a simple yet powerful framework called
FullMatch. Extensive experiments on several common SSL benchmarks
(CIFAR-10/100, SVHN, STL-10 and ImageNet) demonstrate that FullMatch exceeds
FixMatch by a large margin. Integrated with FlexMatch (an advanced
FixMatch-based framework), we achieve state-of-the-art performance. Source code
is at https://github.com/megvii-research/FullMatch.
Related papers
- AllMatch: Exploiting All Unlabeled Data for Semi-Supervised Learning [5.0823084858349485]
We present a novel SSL algorithm named AllMatch, which achieves improved pseudo-label accuracy and a 100% utilization ratio for the unlabeled data.
The results demonstrate that AllMatch consistently outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2024-06-22T06:59:52Z) - RankMatch: A Novel Approach to Semi-Supervised Label Distribution
Learning Leveraging Inter-label Correlations [52.549807652527306]
This paper introduces RankMatch, an innovative approach for Semi-Supervised Label Distribution Learning (SSLDL)
RankMatch effectively utilizes a small number of labeled examples in conjunction with a larger quantity of unlabeled data.
We establish a theoretical generalization bound for RankMatch, and through extensive experiments, demonstrate its superiority in performance against existing SSLDL methods.
arXiv Detail & Related papers (2023-12-11T12:47:29Z) - JointMatch: A Unified Approach for Diverse and Collaborative
Pseudo-Labeling to Semi-Supervised Text Classification [65.268245109828]
Semi-supervised text classification (SSTC) has gained increasing attention due to its ability to leverage unlabeled data.
Existing approaches based on pseudo-labeling suffer from the issues of pseudo-label bias and error accumulation.
We propose JointMatch, a holistic approach for SSTC that addresses these challenges by unifying ideas from recent semi-supervised learning.
arXiv Detail & Related papers (2023-10-23T05:43:35Z) - FreeMatch: Self-adaptive Thresholding for Semi-supervised Learning [46.95063831057502]
We propose emphFreeMatch to define and adjust the confidence threshold in a self-adaptive manner according to the model's learning status.
FreeMatch achieves textbf5.78%, textbf13.59%, and textbf1.28% error rate reduction over the latest state-of-the-art method FlexMatch on CIFAR-10 with 1 label per class.
arXiv Detail & Related papers (2022-05-15T10:07:52Z) - Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models.
Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z) - OpenMatch: Open-set Consistency Regularization for Semi-supervised
Learning with Outliers [71.08167292329028]
We propose a novel Open-set Semi-Supervised Learning (OSSL) approach called OpenMatch.
OpenMatch unifies FixMatch with novelty detection based on one-vs-all (OVA) classifiers.
It achieves state-of-the-art performance on three datasets, and even outperforms a fully supervised model in detecting outliers unseen in unlabeled data on CIFAR10.
arXiv Detail & Related papers (2021-05-28T23:57:15Z) - In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label
Selection Framework for Semi-Supervised Learning [53.1047775185362]
Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its original formulation.
We argue that PL underperforms due to the erroneous high confidence predictions from poorly calibrated models.
We propose an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process.
arXiv Detail & Related papers (2021-01-15T23:29:57Z) - FixMatch: Simplifying Semi-Supervised Learning with Consistency and
Confidence [93.91751021370638]
Semi-supervised learning (SSL) provides an effective means of leveraging unlabeled data to improve a model's performance.
In this paper, we demonstrate the power of a simple combination of two common SSL methods: consistency regularization and pseudo-labeling.
Our algorithm, FixMatch, first generates pseudo-labels using the model's predictions on weakly-augmented unlabeled images.
arXiv Detail & Related papers (2020-01-21T18:32:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.