ABC: Auxiliary Balanced Classifier for Class-imbalanced Semi-supervised
Learning
- URL: http://arxiv.org/abs/2110.10368v1
- Date: Wed, 20 Oct 2021 04:07:48 GMT
- Title: ABC: Auxiliary Balanced Classifier for Class-imbalanced Semi-supervised
Learning
- Authors: Hyuck Lee, Seungjae Shin, Heeyoung Kim
- Abstract summary: Existing semi-supervised learning (SSL) algorithms assume class-balanced datasets.
We propose a scalable class-imbalanced SSL algorithm that can effectively use unlabeled data.
The proposed algorithm achieves state-of-the-art performance in various class-imbalanced SSL experiments using four benchmark datasets.
- Score: 6.866717993664787
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing semi-supervised learning (SSL) algorithms typically assume
class-balanced datasets, although the class distributions of many real-world
datasets are imbalanced. In general, classifiers trained on a class-imbalanced
dataset are biased toward the majority classes. This issue becomes more
problematic for SSL algorithms because they utilize the biased prediction of
unlabeled data for training. However, traditional class-imbalanced learning
techniques, which are designed for labeled data, cannot be readily combined
with SSL algorithms. We propose a scalable class-imbalanced SSL algorithm that
can effectively use unlabeled data, while mitigating class imbalance by
introducing an auxiliary balanced classifier (ABC) of a single layer, which is
attached to a representation layer of an existing SSL algorithm. The ABC is
trained with a class-balanced loss of a minibatch, while using high-quality
representations learned from all data points in the minibatch using the
backbone SSL algorithm to avoid overfitting and information loss.Moreover, we
use consistency regularization, a recent SSL technique for utilizing unlabeled
data in a modified way, to train the ABC to be balanced among the classes by
selecting unlabeled data with the same probability for each class. The proposed
algorithm achieves state-of-the-art performance in various class-imbalanced SSL
experiments using four benchmark datasets.
Related papers
- Semi-Supervised Sparse Gaussian Classification: Provable Benefits of Unlabeled Data [6.812609988733991]
We study SSL for high dimensional Gaussian classification.
We analyze information theoretic lower bounds for accurate feature selection.
We present simulations that complement our theoretical analysis.
arXiv Detail & Related papers (2024-09-05T08:21:05Z) - A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification [51.35500308126506]
Self-supervised learning (SSL) is a machine learning approach where the data itself provides supervision, eliminating the need for external labels.
We study how classification-based evaluation protocols for SSL correlate and how well they predict downstream performance on different dataset types.
arXiv Detail & Related papers (2024-07-16T23:17:36Z) - Boosting Consistency in Dual Training for Long-Tailed Semi-Supervised Learning [49.07038093130949]
Long-tailed semi-supervised learning (LTSSL) algorithms assume that the class distributions of labeled and unlabeled data are almost identical.
We propose a new simple method that can effectively utilize unlabeled data from unknown class distributions.
We show that BOAT achieves state-of-the-art performance on a variety of standard LTSSL benchmarks.
arXiv Detail & Related papers (2024-06-19T03:35:26Z) - An Embarrassingly Simple Baseline for Imbalanced Semi-Supervised
Learning [103.65758569417702]
Semi-supervised learning (SSL) has shown great promise in leveraging unlabeled data to improve model performance.
We consider a more realistic and challenging setting called imbalanced SSL, where imbalanced class distributions occur in both labeled and unlabeled data.
We study a simple yet overlooked baseline -- SimiS -- which tackles data imbalance by simply supplementing labeled data with pseudo-labels.
arXiv Detail & Related papers (2022-11-20T21:18:41Z) - BASIL: Balanced Active Semi-supervised Learning for Class Imbalanced
Datasets [14.739359755029353]
Current semi-supervised learning (SSL) methods assume a balance between the number of data points available for each class in both the labeled and the unlabeled data sets.
We propose BASIL, a novel algorithm that optimize the submodular mutual information (SMI) functions in a per-class fashion to gradually select a balanced dataset in an active learning loop.
arXiv Detail & Related papers (2022-03-10T21:34:08Z) - Self-supervised Learning is More Robust to Dataset Imbalance [65.84339596595383]
We investigate self-supervised learning under dataset imbalance.
Off-the-shelf self-supervised representations are already more robust to class imbalance than supervised representations.
We devise a re-weighted regularization technique that consistently improves the SSL representation quality on imbalanced datasets.
arXiv Detail & Related papers (2021-10-11T06:29:56Z) - Rethinking Re-Sampling in Imbalanced Semi-Supervised Learning [26.069534478556527]
Semi-Supervised Learning (SSL) has shown its strong ability in utilizing unlabeled data when labeled data is scarce.
Most SSL algorithms work under the assumption that the class distributions are balanced in both training and test sets.
In this work, we consider the problem of SSL on class-imbalanced data, which better reflects real-world situations.
arXiv Detail & Related papers (2021-06-01T03:58:18Z) - OpenMatch: Open-set Consistency Regularization for Semi-supervised
Learning with Outliers [71.08167292329028]
We propose a novel Open-set Semi-Supervised Learning (OSSL) approach called OpenMatch.
OpenMatch unifies FixMatch with novelty detection based on one-vs-all (OVA) classifiers.
It achieves state-of-the-art performance on three datasets, and even outperforms a fully supervised model in detecting outliers unseen in unlabeled data on CIFAR10.
arXiv Detail & Related papers (2021-05-28T23:57:15Z) - Distribution Aligning Refinery of Pseudo-label for Imbalanced
Semi-supervised Learning [126.31716228319902]
We develop Distribution Aligning Refinery of Pseudo-label (DARP) algorithm.
We show that DARP is provably and efficiently compatible with state-of-the-art SSL schemes.
arXiv Detail & Related papers (2020-07-17T09:16:05Z) - Class-Imbalanced Semi-Supervised Learning [33.94685366079589]
Semi-Supervised Learning (SSL) has achieved great success in overcoming the difficulties of labeling and making full use of unlabeled data.
We introduce a task of class-imbalanced semi-supervised learning (CISSL), which refers to semi-supervised learning with class-imbalanced data.
Our method shows better performance than the conventional methods in the CISSL environment.
arXiv Detail & Related papers (2020-02-17T07:48:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.