CReST: A Class-Rebalancing Self-Training Framework for Imbalanced
Semi-Supervised Learning
- URL: http://arxiv.org/abs/2102.09559v1
- Date: Thu, 18 Feb 2021 18:59:57 GMT
- Title: CReST: A Class-Rebalancing Self-Training Framework for Imbalanced
Semi-Supervised Learning
- Authors: Chen Wei, Kihyuk Sohn, Clayton Mellina, Alan Yuille, Fan Yang
- Abstract summary: We propose Class-Rebalancing Self-Training (CReST) to improve existing SSL methods on class-imbalanced data.
CReST iteratively retrains a baseline SSL model with a labeled set expanded.
We show that CReST and CReST+ improve state-of-the-art SSL algorithms on various class-imbalanced datasets.
- Score: 15.671523625324388
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semi-supervised learning on class-imbalanced data, although a realistic
problem, has been under studied. While existing semi-supervised learning (SSL)
methods are known to perform poorly on minority classes, we find that they
still generate high precision pseudo-labels on minority classes. By exploiting
this property, in this work, we propose Class-Rebalancing Self-Training
(CReST), a simple yet effective framework to improve existing SSL methods on
class-imbalanced data. CReST iteratively retrains a baseline SSL model with a
labeled set expanded by adding pseudo-labeled samples from an unlabeled set,
where pseudo-labeled samples from minority classes are selected more frequently
according to an estimated class distribution. We also propose a progressive
distribution alignment to adaptively adjust the rebalancing strength dubbed
CReST+. We show that CReST and CReST+ improve state-of-the-art SSL algorithms
on various class-imbalanced datasets and consistently outperform other popular
rebalancing methods.
Related papers
- Sampling Control for Imbalanced Calibration in Semi-Supervised Learning [14.563492336625004]
Class imbalance remains a critical challenge in semi-supervised learning (SSL)<n>We propose a unified framework, SC-SSL, which suppresses model bias through decoupled sampling control.
arXiv Detail & Related papers (2025-11-24T05:15:58Z) - CalibrateMix: Guided-Mixup Calibration of Image Semi-Supervised Models [49.588973929678765]
CalibrateMix is a mixup-based approach that aims to improve the calibration of SSL models.<n>Our method achieves lower expected calibration error (ECE) and superior accuracy compared to existing SSL approaches.
arXiv Detail & Related papers (2025-11-17T04:43:53Z) - Learning Label Refinement and Threshold Adjustment for Imbalanced Semi-Supervised Learning [6.904448748214652]
Semi-supervised learning algorithms struggle to perform well when exposed to imbalanced training data.
We introduce SEmi-supervised learning with pseudo-label optimization based on VALidation data (SEVAL)
SEVAL adapts to specific tasks with improved pseudo-labels accuracy and ensures pseudo-labels correctness on a per-class basis.
arXiv Detail & Related papers (2024-07-07T13:46:22Z) - An Embarrassingly Simple Baseline for Imbalanced Semi-Supervised
Learning [103.65758569417702]
Semi-supervised learning (SSL) has shown great promise in leveraging unlabeled data to improve model performance.
We consider a more realistic and challenging setting called imbalanced SSL, where imbalanced class distributions occur in both labeled and unlabeled data.
We study a simple yet overlooked baseline -- SimiS -- which tackles data imbalance by simply supplementing labeled data with pseudo-labels.
arXiv Detail & Related papers (2022-11-20T21:18:41Z) - PercentMatch: Percentile-based Dynamic Thresholding for Multi-Label
Semi-Supervised Classification [64.39761523935613]
We propose a percentile-based threshold adjusting scheme to dynamically alter the score thresholds of positive and negative pseudo-labels for each class during the training.
We achieve strong performance on Pascal VOC2007 and MS-COCO datasets when compared to recent SSL methods.
arXiv Detail & Related papers (2022-08-30T01:27:48Z) - BASIL: Balanced Active Semi-supervised Learning for Class Imbalanced
Datasets [14.739359755029353]
Current semi-supervised learning (SSL) methods assume a balance between the number of data points available for each class in both the labeled and the unlabeled data sets.
We propose BASIL, a novel algorithm that optimize the submodular mutual information (SMI) functions in a per-class fashion to gradually select a balanced dataset in an active learning loop.
arXiv Detail & Related papers (2022-03-10T21:34:08Z) - Class-Aware Contrastive Semi-Supervised Learning [51.205844705156046]
We propose a general method named Class-aware Contrastive Semi-Supervised Learning (CCSSL) to improve pseudo-label quality and enhance the model's robustness in the real-world setting.
Our proposed CCSSL has significant performance improvements over the state-of-the-art SSL methods on the standard datasets CIFAR100 and STL10.
arXiv Detail & Related papers (2022-03-04T12:18:23Z) - Prototypical Classifier for Robust Class-Imbalanced Learning [64.96088324684683]
We propose textitPrototypical, which does not require fitting additional parameters given the embedding network.
Prototypical produces balanced and comparable predictions for all classes even though the training set is class-imbalanced.
We test our method on CIFAR-10LT, CIFAR-100LT and Webvision datasets, observing that Prototypical obtains substaintial improvements compared with state of the arts.
arXiv Detail & Related papers (2021-10-22T01:55:01Z) - Self-supervised Learning is More Robust to Dataset Imbalance [65.84339596595383]
We investigate self-supervised learning under dataset imbalance.
Off-the-shelf self-supervised representations are already more robust to class imbalance than supervised representations.
We devise a re-weighted regularization technique that consistently improves the SSL representation quality on imbalanced datasets.
arXiv Detail & Related papers (2021-10-11T06:29:56Z) - Rethinking Re-Sampling in Imbalanced Semi-Supervised Learning [26.069534478556527]
Semi-Supervised Learning (SSL) has shown its strong ability in utilizing unlabeled data when labeled data is scarce.
Most SSL algorithms work under the assumption that the class distributions are balanced in both training and test sets.
In this work, we consider the problem of SSL on class-imbalanced data, which better reflects real-world situations.
arXiv Detail & Related papers (2021-06-01T03:58:18Z) - Distribution Aligning Refinery of Pseudo-label for Imbalanced
Semi-supervised Learning [126.31716228319902]
We develop Distribution Aligning Refinery of Pseudo-label (DARP) algorithm.
We show that DARP is provably and efficiently compatible with state-of-the-art SSL schemes.
arXiv Detail & Related papers (2020-07-17T09:16:05Z) - Class-Imbalanced Semi-Supervised Learning [33.94685366079589]
Semi-Supervised Learning (SSL) has achieved great success in overcoming the difficulties of labeling and making full use of unlabeled data.
We introduce a task of class-imbalanced semi-supervised learning (CISSL), which refers to semi-supervised learning with class-imbalanced data.
Our method shows better performance than the conventional methods in the CISSL environment.
arXiv Detail & Related papers (2020-02-17T07:48:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.