An Embarrassingly Simple Baseline for Imbalanced Semi-Supervised
Learning
- URL: http://arxiv.org/abs/2211.11086v2
- Date: Thu, 18 Jan 2024 14:40:43 GMT
- Title: An Embarrassingly Simple Baseline for Imbalanced Semi-Supervised
Learning
- Authors: Hao Chen, Yue Fan, Yidong Wang, Jindong Wang, Bernt Schiele, Xing Xie,
Marios Savvides, Bhiksha Raj
- Abstract summary: Semi-supervised learning (SSL) has shown great promise in leveraging unlabeled data to improve model performance.
We consider a more realistic and challenging setting called imbalanced SSL, where imbalanced class distributions occur in both labeled and unlabeled data.
We study a simple yet overlooked baseline -- SimiS -- which tackles data imbalance by simply supplementing labeled data with pseudo-labels.
- Score: 103.65758569417702
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semi-supervised learning (SSL) has shown great promise in leveraging
unlabeled data to improve model performance. While standard SSL assumes uniform
data distribution, we consider a more realistic and challenging setting called
imbalanced SSL, where imbalanced class distributions occur in both labeled and
unlabeled data. Although there are existing endeavors to tackle this challenge,
their performance degenerates when facing severe imbalance since they can not
reduce the class imbalance sufficiently and effectively. In this paper, we
study a simple yet overlooked baseline -- SimiS -- which tackles data imbalance
by simply supplementing labeled data with pseudo-labels, according to the
difference in class distribution from the most frequent class. Such a simple
baseline turns out to be highly effective in reducing class imbalance. It
outperforms existing methods by a significant margin, e.g., 12.8%, 13.6%, and
16.7% over previous SOTA on CIFAR100-LT, FOOD101-LT, and ImageNet127
respectively. The reduced imbalance results in faster convergence and better
pseudo-label accuracy of SimiS. The simplicity of our method also makes it
possible to be combined with other re-balancing techniques to improve the
performance further. Moreover, our method shows great robustness to a wide
range of data distributions, which holds enormous potential in practice. Code
will be publicly available.
Related papers
- A Channel-ensemble Approach: Unbiased and Low-variance Pseudo-labels is Critical for Semi-supervised Classification [61.473485511491795]
Semi-supervised learning (SSL) is a practical challenge in computer vision.
Pseudo-label (PL) methods, e.g., FixMatch and FreeMatch, obtain the State Of The Art (SOTA) performances in SSL.
We propose a lightweight channel-based ensemble method to consolidate multiple inferior PLs into the theoretically guaranteed unbiased and low-variance one.
arXiv Detail & Related papers (2024-03-27T09:49:37Z) - BaCon: Boosting Imbalanced Semi-supervised Learning via Balanced Feature-Level Contrastive Learning [0.9160375060389779]
In Class Imbalanced Semi-supervised Learning (CISSL), the bias introduced by unreliable pseudo-labels can be exacerbated by imbalanced data distributions.
Our method directly regularizes the distribution of instances' representations in a well-designed contrastive manner.
Our method demonstrates its effectiveness through comprehensive experiments on the CIFAR10-LT, CIFAR100-LT, STL10-LT, and SVHN-LT datasets.
arXiv Detail & Related papers (2024-03-04T06:43:16Z) - Gradient Reweighting: Towards Imbalanced Class-Incremental Learning [8.438092346233054]
Class-Incremental Learning (CIL) trains a model to continually recognize new classes from non-stationary data.
A major challenge of CIL arises when applying to real-world data characterized by non-uniform distribution.
We show that this dual imbalance issue causes skewed gradient updates with biased weights in FC layers, thus inducing over/under-fitting and catastrophic forgetting in CIL.
arXiv Detail & Related papers (2024-02-28T18:08:03Z) - Class-Imbalanced Graph Learning without Class Rebalancing [62.1368829847041]
Class imbalance is prevalent in real-world node classification tasks and poses great challenges for graph learning models.
In this work, we approach the root cause of class-imbalance bias from an topological paradigm.
We devise a lightweight topological augmentation framework BAT to mitigate the class-imbalance bias without class rebalancing.
arXiv Detail & Related papers (2023-08-27T19:01:29Z) - On Pseudo-Labeling for Class-Mismatch Semi-Supervised Learning [50.48888534815361]
In this paper, we empirically analyze Pseudo-Labeling (PL) in class-mismatched SSL.
PL is a simple and representative SSL method that transforms SSL problems into supervised learning by creating pseudo-labels for unlabeled data.
We propose to improve PL in class-mismatched SSL with two components -- Re-balanced Pseudo-Labeling (RPL) and Semantic Exploration Clustering (SEC)
arXiv Detail & Related papers (2023-01-15T03:21:59Z) - BASIL: Balanced Active Semi-supervised Learning for Class Imbalanced
Datasets [14.739359755029353]
Current semi-supervised learning (SSL) methods assume a balance between the number of data points available for each class in both the labeled and the unlabeled data sets.
We propose BASIL, a novel algorithm that optimize the submodular mutual information (SMI) functions in a per-class fashion to gradually select a balanced dataset in an active learning loop.
arXiv Detail & Related papers (2022-03-10T21:34:08Z) - PLM: Partial Label Masking for Imbalanced Multi-label Classification [59.68444804243782]
Neural networks trained on real-world datasets with long-tailed label distributions are biased towards frequent classes and perform poorly on infrequent classes.
We propose a method, Partial Label Masking (PLM), which utilizes this ratio during training.
Our method achieves strong performance when compared to existing methods on both multi-label (MultiMNIST and MSCOCO) and single-label (imbalanced CIFAR-10 and CIFAR-100) image classification datasets.
arXiv Detail & Related papers (2021-05-22T18:07:56Z) - Distribution Aligning Refinery of Pseudo-label for Imbalanced
Semi-supervised Learning [126.31716228319902]
We develop Distribution Aligning Refinery of Pseudo-label (DARP) algorithm.
We show that DARP is provably and efficiently compatible with state-of-the-art SSL schemes.
arXiv Detail & Related papers (2020-07-17T09:16:05Z) - Class-Imbalanced Semi-Supervised Learning [33.94685366079589]
Semi-Supervised Learning (SSL) has achieved great success in overcoming the difficulties of labeling and making full use of unlabeled data.
We introduce a task of class-imbalanced semi-supervised learning (CISSL), which refers to semi-supervised learning with class-imbalanced data.
Our method shows better performance than the conventional methods in the CISSL environment.
arXiv Detail & Related papers (2020-02-17T07:48:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.