Unsupervised Selective Labeling for More Effective Semi-Supervised
Learning
- URL: http://arxiv.org/abs/2110.03006v4
- Date: Wed, 23 Aug 2023 16:47:25 GMT
- Title: Unsupervised Selective Labeling for More Effective Semi-Supervised
Learning
- Authors: Xudong Wang, Long Lian, Stella X. Yu
- Abstract summary: unsupervised selective labeling consistently improves SSL methods over state-of-the-art active learning given labeled data.
Our work sets a new standard for practical and efficient SSL.
- Score: 46.414510522978425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given an unlabeled dataset and an annotation budget, we study how to
selectively label a fixed number of instances so that semi-supervised learning
(SSL) on such a partially labeled dataset is most effective. We focus on
selecting the right data to label, in addition to usual SSL's propagating
labels from labeled data to the rest unlabeled data. This instance selection
task is challenging, as without any labeled data we do not know what the
objective of learning should be. Intuitively, no matter what the downstream
task is, instances to be labeled must be representative and diverse: The former
would facilitate label propagation to unlabeled data, whereas the latter would
ensure coverage of the entire dataset. We capture this idea by selecting
cluster prototypes, either in a pretrained feature space, or along with feature
optimization, both without labels. Our unsupervised selective labeling
consistently improves SSL methods over state-of-the-art active learning given
labeled data, by 8 to 25 times in label efficiency. For example, it boosts
FixMatch by 10% (14%) in accuracy on CIFAR-10 (ImageNet-1K) with 0.08% (0.2%)
labeled data, demonstrating that small computation spent on selecting what data
to label brings significant gain especially under a low annotation budget. Our
work sets a new standard for practical and efficient SSL.
Related papers
- You can't handle the (dirty) truth: Data-centric insights improve pseudo-labeling [60.27812493442062]
We show the importance of investigating labeled data quality to improve any pseudo-labeling method.
Specifically, we introduce a novel data characterization and selection framework called DIPS to extend pseudo-labeling.
We demonstrate the applicability and impact of DIPS for various pseudo-labeling methods across an extensive range of real-world datasets.
arXiv Detail & Related papers (2024-06-19T17:58:40Z) - FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness
for Semi-Supervised Learning [73.13448439554497]
Semi-Supervised Learning (SSL) has been an effective way to leverage abundant unlabeled data with extremely scarce labeled data.
Most SSL methods are commonly based on instance-wise consistency between different data transformations.
We propose FlatMatch which minimizes a cross-sharpness measure to ensure consistent learning performance between the two datasets.
arXiv Detail & Related papers (2023-10-25T06:57:59Z) - Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and
Uncurated Unlabeled Data [70.25049762295193]
We introduce a novel conditional image generation framework that accepts noisy-labeled and uncurated data during training.
We propose soft curriculum learning, which assigns instance-wise weights for adversarial training while assigning new labels for unlabeled data.
Our experiments show that our approach outperforms existing semi-supervised and label-noise robust methods in terms of both quantitative and qualitative performance.
arXiv Detail & Related papers (2023-07-17T08:31:59Z) - Impact of Strategic Sampling and Supervision Policies on Semi-supervised Learning [23.4909421082857]
In semi-supervised representation learning frameworks, when the number of labelled data is very scarce, the quality and representativeness of these samples become increasingly important.
Existing literature on semi-supervised learning randomly sample a limited number of data points for labelling.
All these labelled samples are then used along with the unlabelled data throughout the training process.
arXiv Detail & Related papers (2022-11-27T18:29:54Z) - Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models.
Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z) - Weakly Supervised Pseudo-Label assisted Learning for ALS Point Cloud
Semantic Segmentation [1.4620086904601473]
Competitive point cloud results usually rely on a large amount of labeled data.
In this study, we propose a pseudo-labeling strategy to obtain accurate results with limited ground truth.
arXiv Detail & Related papers (2021-05-05T08:07:21Z) - In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label
Selection Framework for Semi-Supervised Learning [53.1047775185362]
Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its original formulation.
We argue that PL underperforms due to the erroneous high confidence predictions from poorly calibrated models.
We propose an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process.
arXiv Detail & Related papers (2021-01-15T23:29:57Z) - Unsupervised Semantic Aggregation and Deformable Template Matching for
Semi-Supervised Learning [34.560447389853614]
Unsupervised semantic aggregation based on T-MI loss is explored to generate semantic labels for unlabeled data.
A feature pool that stores the labeled samples is dynamically updated to assign proxy labels for unlabeled data.
Experiments and analysis validate that USADTM achieves top performance.
arXiv Detail & Related papers (2020-10-12T08:17:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.