Uncertainty-aware Pseudo-label Selection for Positive-Unlabeled Learning
- URL: http://arxiv.org/abs/2201.13192v3
- Date: Sun, 10 Mar 2024 13:49:43 GMT
- Title: Uncertainty-aware Pseudo-label Selection for Positive-Unlabeled Learning
- Authors: Emilio Dorigatti, Jann Goschenhofer, Benjamin Schubert, Mina Rezaei,
Bernd Bischl
- Abstract summary: We propose to tackle the issues of imbalanced datasets and model calibration in a positive-unlabeled learning setting.
By boosting the signal from the minority class, pseudo-labeling expands the labeled dataset with new samples from the unlabeled set.
Within a series of experiments, PUUPL yields substantial performance gains in highly imbalanced settings.
- Score: 10.014356492742074
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Positive-unlabeled learning (PUL) aims at learning a binary classifier from
only positive and unlabeled training data. Even though real-world applications
often involve imbalanced datasets where the majority of examples belong to one
class, most contemporary approaches to PUL do not investigate performance in
this setting, thus severely limiting their applicability in practice. In this
work, we thus propose to tackle the issues of imbalanced datasets and model
calibration in a PUL setting through an uncertainty-aware pseudo-labeling
procedure (PUUPL): by boosting the signal from the minority class,
pseudo-labeling expands the labeled dataset with new samples from the unlabeled
set, while explicit uncertainty quantification prevents the emergence of
harmful confirmation bias leading to increased predictive performance. Within a
series of experiments, PUUPL yields substantial performance gains in highly
imbalanced settings while also showing strong performance in balanced PU
scenarios across recent baselines. We furthermore provide ablations and
sensitivity analyses to shed light on PUUPL's several ingredients. Finally, a
real-world application with an imbalanced dataset confirms the advantage of our
approach.
Related papers
- An Unbiased Risk Estimator for Partial Label Learning with Augmented Classes [46.663081214928226]
We propose an unbiased risk estimator with theoretical guarantees for PLLAC.
We provide a theoretical analysis of the estimation error bound of PLLAC.
Experiments on benchmark, UCI and real-world datasets demonstrate the effectiveness of the proposed approach.
arXiv Detail & Related papers (2024-09-29T07:36:16Z) - A Channel-ensemble Approach: Unbiased and Low-variance Pseudo-labels is Critical for Semi-supervised Classification [61.473485511491795]
Semi-supervised learning (SSL) is a practical challenge in computer vision.
Pseudo-label (PL) methods, e.g., FixMatch and FreeMatch, obtain the State Of The Art (SOTA) performances in SSL.
We propose a lightweight channel-based ensemble method to consolidate multiple inferior PLs into the theoretically guaranteed unbiased and low-variance one.
arXiv Detail & Related papers (2024-03-27T09:49:37Z) - Learning with Imbalanced Noisy Data by Preventing Bias in Sample
Selection [82.43311784594384]
Real-world datasets contain not only noisy labels but also class imbalance.
We propose a simple yet effective method to address noisy labels in imbalanced datasets.
arXiv Detail & Related papers (2024-02-17T10:34:53Z) - CLAF: Contrastive Learning with Augmented Features for Imbalanced
Semi-Supervised Learning [40.5117833362268]
Semi-supervised learning and contrastive learning have been progressively combined to achieve better performances in popular applications.
One common manner is assigning pseudo-labels to unlabeled samples and selecting positive and negative samples from pseudo-labeled samples to apply contrastive learning.
We propose Contrastive Learning with Augmented Features (CLAF) to alleviate the scarcity of minority class samples in contrastive learning.
arXiv Detail & Related papers (2023-12-15T08:27:52Z) - Beyond Myopia: Learning from Positive and Unlabeled Data through
Holistic Predictive Trends [26.79150786180822]
We unveil an intriguing yet long-overlooked observation in PUL.
Predictive trends for positive and negative classes display distinctly different patterns.
We propose a novel TPP-inspired measure for trend detection and prove its unbiasedness in predicting changes.
arXiv Detail & Related papers (2023-10-06T08:06:15Z) - SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised
Learning [101.86916775218403]
This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation.
We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training.
In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
arXiv Detail & Related papers (2023-01-26T03:53:25Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z) - In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label
Selection Framework for Semi-Supervised Learning [53.1047775185362]
Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its original formulation.
We argue that PL underperforms due to the erroneous high confidence predictions from poorly calibrated models.
We propose an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process.
arXiv Detail & Related papers (2021-01-15T23:29:57Z) - Distribution Aligning Refinery of Pseudo-label for Imbalanced
Semi-supervised Learning [126.31716228319902]
We develop Distribution Aligning Refinery of Pseudo-label (DARP) algorithm.
We show that DARP is provably and efficiently compatible with state-of-the-art SSL schemes.
arXiv Detail & Related papers (2020-07-17T09:16:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.