DP-SSL: Towards Robust Semi-supervised Learning with A Few Labeled
Samples
- URL: http://arxiv.org/abs/2110.13740v1
- Date: Tue, 26 Oct 2021 14:43:12 GMT
- Title: DP-SSL: Towards Robust Semi-supervised Learning with A Few Labeled
Samples
- Authors: Yi Xu, Jiandong Ding, Lu Zhang, Shuigeng Zhou
- Abstract summary: Semi-supervised learning (SSL) provides a promising way to leverage unlabeled data by pseudo labels.
When the size of labeled data is very small, SSL performs poorly and unstably, possibly due to the low quality of learned pseudo labels.
We propose DP-SSL that adopts an innovative data programming scheme to generate probabilistic labels for unlabeled data.
- Score: 32.804647684320216
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The scarcity of labeled data is a critical obstacle to deep learning.
Semi-supervised learning (SSL) provides a promising way to leverage unlabeled
data by pseudo labels. However, when the size of labeled data is very small
(say a few labeled samples per class), SSL performs poorly and unstably,
possibly due to the low quality of learned pseudo labels. In this paper, we
propose a new SSL method called DP-SSL that adopts an innovative data
programming (DP) scheme to generate probabilistic labels for unlabeled data.
Different from existing DP methods that rely on human experts to provide
initial labeling functions (LFs), we develop a multiple-choice learning~(MCL)
based approach to automatically generate LFs from scratch in SSL style. With
the noisy labels produced by the LFs, we design a label model to resolve the
conflict and overlap among the noisy labels, and finally infer probabilistic
labels for unlabeled samples. Extensive experiments on four standard SSL
benchmarks show that DP-SSL can provide reliable labels for unlabeled data and
achieve better classification performance on test sets than existing SSL
methods, especially when only a small number of labeled samples are available.
Concretely, for CIFAR-10 with only 40 labeled samples, DP-SSL achieves 93.82%
annotation accuracy on unlabeled data and 93.46% classification accuracy on
test data, which are higher than the SOTA results.
Related papers
- Semi-Supervised Sparse Gaussian Classification: Provable Benefits of Unlabeled Data [6.812609988733991]
We study SSL for high dimensional Gaussian classification.
We analyze information theoretic lower bounds for accurate feature selection.
We present simulations that complement our theoretical analysis.
arXiv Detail & Related papers (2024-09-05T08:21:05Z) - Learning Label Refinement and Threshold Adjustment for Imbalanced Semi-Supervised Learning [6.904448748214652]
Semi-supervised learning algorithms struggle to perform well when exposed to imbalanced training data.
We introduce SEmi-supervised learning with pseudo-label optimization based on VALidation data (SEVAL)
SEVAL adapts to specific tasks with improved pseudo-labels accuracy and ensures pseudo-labels correctness on a per-class basis.
arXiv Detail & Related papers (2024-07-07T13:46:22Z) - Generalized Semi-Supervised Learning via Self-Supervised Feature Adaptation [87.17768598044427]
Traditional semi-supervised learning assumes that the feature distributions of labeled and unlabeled data are consistent.
We propose Self-Supervised Feature Adaptation (SSFA), a generic framework for improving SSL performance when labeled and unlabeled data come from different distributions.
Our proposed SSFA is applicable to various pseudo-label-based SSL learners and significantly improves performance in labeled, unlabeled, and even unseen distributions.
arXiv Detail & Related papers (2024-05-31T03:13:45Z) - Prompt-based Pseudo-labeling Strategy for Sample-Efficient Semi-Supervised Extractive Summarization [12.582774521907227]
Semi-supervised learning (SSL) is a widely used technique in scenarios where labeled data is scarce and unlabeled data is abundant.
Standard SSL methods follow a teacher-student paradigm to first train a classification model and then use the classifier's confidence values to select pseudo-labels.
We propose a prompt-based pseudo-labeling strategy with LLMs that picks unlabeled examples with more accurate pseudo-labels.
arXiv Detail & Related papers (2023-11-16T04:29:41Z) - Pseudo-Labeling Based Practical Semi-Supervised Meta-Training for Few-Shot Learning [93.63638405586354]
We propose a simple and effective meta-training framework, called pseudo-labeling based meta-learning (PLML)
Firstly, we train a classifier via common semi-supervised learning (SSL) and use it to obtain the pseudo-labels of unlabeled data.
We build few-shot tasks from labeled and pseudo-labeled data and design a novel finetuning method with feature smoothing and noise suppression.
arXiv Detail & Related papers (2022-07-14T10:53:53Z) - Robust Deep Semi-Supervised Learning: A Brief Introduction [63.09703308309176]
Semi-supervised learning (SSL) aims to improve learning performance by leveraging unlabeled data when labels are insufficient.
SSL with deep models has proven to be successful on standard benchmark tasks.
However, they are still vulnerable to various robustness threats in real-world applications.
arXiv Detail & Related papers (2022-02-12T04:16:41Z) - Pseudo-Labeled Auto-Curriculum Learning for Semi-Supervised Keypoint
Localization [88.74813798138466]
Localizing keypoints of an object is a basic visual problem.
Supervised learning of a keypoint localization network often requires a large amount of data.
We propose to automatically select reliable pseudo-labeled samples with a series of dynamic thresholds.
arXiv Detail & Related papers (2022-01-21T09:51:58Z) - In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label
Selection Framework for Semi-Supervised Learning [53.1047775185362]
Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its original formulation.
We argue that PL underperforms due to the erroneous high confidence predictions from poorly calibrated models.
We propose an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process.
arXiv Detail & Related papers (2021-01-15T23:29:57Z) - Multi-Task Curriculum Framework for Open-Set Semi-Supervised Learning [54.85397562961903]
Semi-supervised learning (SSL) has been proposed to leverage unlabeled data for training powerful models when only limited labeled data is available.
We address a more complex novel scenario named open-set SSL, where out-of-distribution (OOD) samples are contained in unlabeled data.
Our method achieves state-of-the-art results by successfully eliminating the effect of OOD samples.
arXiv Detail & Related papers (2020-07-22T10:33:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.