Related papers: Leveraging Label Proportion Prior for Class-Imbalanced Semi-Supervised Learning

Leveraging Label Proportion Prior for Class-Imbalanced Semi-Supervised Learning

URL: http://arxiv.org/abs/2603.02957v1
Date: Tue, 03 Mar 2026 13:09:19 GMT
Title: Leveraging Label Proportion Prior for Class-Imbalanced Semi-Supervised Learning
Authors: Kohki Akiba, Shinnosuke Matsuo, Shota Harada, Ryoma Bise,
Abstract summary: Semi-supervised learning (SSL) often suffers under class imbalance, where pseudo-labeling amplifies majority bias and suppresses minority performance.<n>We introduce Proportion Loss from label proportions (LLP) into SSL as a regularization term. Proportion Loss aligns model predictions with the global class distribution, mitigating bias across both majority and minority classes.<n>Experiments on the Long-tailed CIFAR-10 benchmark show that integrating Proportion Loss into FixMatch and ReMixMatch consistently improves performance over the baselines across imbalance severities and label ratios.
Score: 7.32639535694124
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Semi-supervised learning (SSL) often suffers under class imbalance, where pseudo-labeling amplifies majority bias and suppresses minority performance. We address this issue with a lightweight framework that, to our knowledge, is the first to introduce Proportion Loss from learning from label proportions (LLP) into SSL as a regularization term. Proportion Loss aligns model predictions with the global class distribution, mitigating bias across both majority and minority classes. To further stabilize training, we formulate a stochastic variant that accounts for fluctuations in mini-batch composition. Experiments on the Long-tailed CIFAR-10 benchmark show that integrating Proportion Loss into FixMatch and ReMixMatch consistently improves performance over the baselines across imbalance severities and label ratios, and achieves competitive or superior results compared to existing CISSL methods, particularly under scarce-label conditions.

Related papers

Sampling Control for Imbalanced Calibration in Semi-Supervised Learning [14.563492336625004]
Class imbalance remains a critical challenge in semi-supervised learning (SSL)<n>We propose a unified framework, SC-SSL, which suppresses model bias through decoupled sampling control.
arXiv Detail & Related papers (2025-11-24T05:15:58Z)
SeMi: When Imbalanced Semi-Supervised Learning Meets Mining Hard Examples [54.760757107700755]
Semi-Supervised Learning (SSL) can leverage abundant unlabeled data to boost model performance.<n>The class-imbalanced data distribution in real-world scenarios poses great challenges to SSL, resulting in performance degradation.<n>We propose a method that enhances the performance of Imbalanced Semi-Supervised Learning by Mining Hard Examples (SeMi)
arXiv Detail & Related papers (2025-01-10T14:35:16Z)
Flexible Distribution Alignment: Towards Long-tailed Semi-supervised Learning with Proper Calibration [18.376601653387315]
Longtailed semi-supervised learning (LTSSL) represents a practical scenario for semi-supervised applications. This problem is often aggravated by discrepancies between labeled and unlabeled class distributions. We introduce Flexible Distribution Alignment (FlexDA), a novel adaptive logit-adjusted loss framework.
arXiv Detail & Related papers (2023-06-07T17:50:59Z)
Learning in Imperfect Environment: Multi-Label Classification with Long-Tailed Distribution and Partial Labels [53.68653940062605]
We introduce a novel task, Partial labeling and Long-Tailed Multi-Label Classification (PLT-MLC) We find that most LT-MLC and PL-MLC approaches fail to solve the degradation-MLC. We propose an end-to-end learning framework: textbfCOrrection $rightarrow$ textbfModificattextbfIon $rightarrow$ balantextbfCe.
arXiv Detail & Related papers (2023-04-20T20:05:08Z)
An Embarrassingly Simple Baseline for Imbalanced Semi-Supervised Learning [103.65758569417702]
Semi-supervised learning (SSL) has shown great promise in leveraging unlabeled data to improve model performance. We consider a more realistic and challenging setting called imbalanced SSL, where imbalanced class distributions occur in both labeled and unlabeled data. We study a simple yet overlooked baseline -- SimiS -- which tackles data imbalance by simply supplementing labeled data with pseudo-labels.
arXiv Detail & Related papers (2022-11-20T21:18:41Z)
PercentMatch: Percentile-based Dynamic Thresholding for Multi-Label Semi-Supervised Classification [64.39761523935613]
We propose a percentile-based threshold adjusting scheme to dynamically alter the score thresholds of positive and negative pseudo-labels for each class during the training. We achieve strong performance on Pascal VOC2007 and MS-COCO datasets when compared to recent SSL methods.
arXiv Detail & Related papers (2022-08-30T01:27:48Z)
In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning [53.1047775185362]
Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its original formulation. We argue that PL underperforms due to the erroneous high confidence predictions from poorly calibrated models. We propose an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process.
arXiv Detail & Related papers (2021-01-15T23:29:57Z)
Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning [126.31716228319902]
We develop Distribution Aligning Refinery of Pseudo-label (DARP) algorithm. We show that DARP is provably and efficiently compatible with state-of-the-art SSL schemes.
arXiv Detail & Related papers (2020-07-17T09:16:05Z)
Class-Imbalanced Semi-Supervised Learning [33.94685366079589]
Semi-Supervised Learning (SSL) has achieved great success in overcoming the difficulties of labeling and making full use of unlabeled data. We introduce a task of class-imbalanced semi-supervised learning (CISSL), which refers to semi-supervised learning with class-imbalanced data. Our method shows better performance than the conventional methods in the CISSL environment.
arXiv Detail & Related papers (2020-02-17T07:48:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.