Rethinking Re-Sampling in Imbalanced Semi-Supervised Learning
- URL: http://arxiv.org/abs/2106.00209v1
- Date: Tue, 1 Jun 2021 03:58:18 GMT
- Title: Rethinking Re-Sampling in Imbalanced Semi-Supervised Learning
- Authors: Ju He, Adam Kortylewski, Shaokang Yang, Shuai Liu, Cheng Yang, Changhu
Wang, Alan Yuille
- Abstract summary: Semi-Supervised Learning (SSL) has shown its strong ability in utilizing unlabeled data when labeled data is scarce.
Most SSL algorithms work under the assumption that the class distributions are balanced in both training and test sets.
In this work, we consider the problem of SSL on class-imbalanced data, which better reflects real-world situations.
- Score: 26.069534478556527
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semi-Supervised Learning (SSL) has shown its strong ability in utilizing
unlabeled data when labeled data is scarce. However, most SSL algorithms work
under the assumption that the class distributions are balanced in both training
and test sets. In this work, we consider the problem of SSL on class-imbalanced
data, which better reflects real-world situations but has only received limited
attention so far. In particular, we decouple the training of the representation
and the classifier, and systematically investigate the effects of different
data re-sampling techniques when training the whole network including a
classifier as well as fine-tuning the feature extractor only. We find that data
re-sampling is of critical importance to learn a good classifier as it
increases the accuracy of the pseudo-labels, in particular for the minority
classes in the unlabeled data. Interestingly, we find that accurate
pseudo-labels do not help when training the feature extractor, rather
contrariwise, data re-sampling harms the training of the feature extractor.
This finding is against the general intuition that wrong pseudo-labels always
harm the model performance in SSL. Based on these findings, we suggest to
re-think the current paradigm of having a single data re-sampling strategy and
develop a simple yet highly effective Bi-Sampling (BiS) strategy for SSL on
class-imbalanced data. BiS implements two different re-sampling strategies for
training the feature extractor and the classifier and integrates this decoupled
training into an end-to-end framework... Code will be released at
https://github.com/TACJu/Bi-Sampling.
Related papers
- Semi-Supervised Sparse Gaussian Classification: Provable Benefits of Unlabeled Data [6.812609988733991]
We study SSL for high dimensional Gaussian classification.
We analyze information theoretic lower bounds for accurate feature selection.
We present simulations that complement our theoretical analysis.
arXiv Detail & Related papers (2024-09-05T08:21:05Z) - SSB: Simple but Strong Baseline for Boosting Performance of Open-Set
Semi-Supervised Learning [106.46648817126984]
In this paper, we study the challenging and realistic open-set SSL setting.
The goal is to both correctly classify inliers and to detect outliers.
We find that inlier classification performance can be largely improved by incorporating high-confidence pseudo-labeled data.
arXiv Detail & Related papers (2023-11-17T15:14:40Z) - Progressive Feature Adjustment for Semi-supervised Learning from
Pretrained Models [39.42802115580677]
Semi-supervised learning (SSL) can leverage both labeled and unlabeled data to build a predictive model.
Recent literature suggests that naively applying state-of-the-art SSL with a pretrained model fails to unleash the full potential of training data.
We propose to use pseudo-labels from the unlabelled data to update the feature extractor that is less sensitive to incorrect labels.
arXiv Detail & Related papers (2023-09-09T01:57:14Z) - Transfer and Share: Semi-Supervised Learning from Long-Tailed Data [27.88381366842497]
We present the TRAS (TRAnsfer and Share) to effectively utilize long-tailed semi-supervised data.
TRAS transforms the imbalanced pseudo-label distribution of a traditional SSL model.
It then transfers the distribution to a target model such that the minority class will receive significant attention.
arXiv Detail & Related papers (2022-05-26T13:37:59Z) - An analysis of over-sampling labeled data in semi-supervised learning
with FixMatch [66.34968300128631]
Most semi-supervised learning methods over-sample labeled data when constructing training mini-batches.
This paper studies whether this common practice improves learning and how.
We compare it to an alternative setting where each mini-batch is uniformly sampled from all the training data, labeled or not.
arXiv Detail & Related papers (2022-01-03T12:22:26Z) - ABC: Auxiliary Balanced Classifier for Class-imbalanced Semi-supervised
Learning [6.866717993664787]
Existing semi-supervised learning (SSL) algorithms assume class-balanced datasets.
We propose a scalable class-imbalanced SSL algorithm that can effectively use unlabeled data.
The proposed algorithm achieves state-of-the-art performance in various class-imbalanced SSL experiments using four benchmark datasets.
arXiv Detail & Related papers (2021-10-20T04:07:48Z) - Self-supervised Learning is More Robust to Dataset Imbalance [65.84339596595383]
We investigate self-supervised learning under dataset imbalance.
Off-the-shelf self-supervised representations are already more robust to class imbalance than supervised representations.
We devise a re-weighted regularization technique that consistently improves the SSL representation quality on imbalanced datasets.
arXiv Detail & Related papers (2021-10-11T06:29:56Z) - Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models.
Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z) - BiSTF: Bilateral-Branch Self-Training Framework for Semi-Supervised
Large-scale Fine-Grained Recognition [28.06659482245647]
Semi-supervised Fine-Grained Recognition is a challenge task due to data imbalance, high interclass similarity and domain mismatch.
We propose Bilateral-Branch Self-Training Framework (BiSTF) to improve existing semi-balanced and domain-shifted fine-grained data.
We show BiSTF outperforms the existing state-of-the-art SSL on Semi-iNat dataset.
arXiv Detail & Related papers (2021-07-14T15:28:54Z) - OpenMatch: Open-set Consistency Regularization for Semi-supervised
Learning with Outliers [71.08167292329028]
We propose a novel Open-set Semi-Supervised Learning (OSSL) approach called OpenMatch.
OpenMatch unifies FixMatch with novelty detection based on one-vs-all (OVA) classifiers.
It achieves state-of-the-art performance on three datasets, and even outperforms a fully supervised model in detecting outliers unseen in unlabeled data on CIFAR10.
arXiv Detail & Related papers (2021-05-28T23:57:15Z) - Semi-supervised Long-tailed Recognition using Alternate Sampling [95.93760490301395]
Main challenges in long-tailed recognition come from the imbalanced data distribution and sample scarcity in its tail classes.
We propose a new recognition setting, namely semi-supervised long-tailed recognition.
We demonstrate significant accuracy improvements over other competitive methods on two datasets.
arXiv Detail & Related papers (2021-05-01T00:43:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.