Related papers: Uncertainty-aware Pseudo-label Selection for Positive-Unlabeled Learning

Uncertainty-aware Pseudo-label Selection for Positive-Unlabeled Learning

URL: http://arxiv.org/abs/2201.13192v3
Date: Sun, 10 Mar 2024 13:49:43 GMT
Title: Uncertainty-aware Pseudo-label Selection for Positive-Unlabeled Learning
Authors: Emilio Dorigatti, Jann Goschenhofer, Benjamin Schubert, Mina Rezaei, Bernd Bischl
Abstract summary: We propose to tackle the issues of imbalanced datasets and model calibration in a positive-unlabeled learning setting. By boosting the signal from the minority class, pseudo-labeling expands the labeled dataset with new samples from the unlabeled set. Within a series of experiments, PUUPL yields substantial performance gains in highly imbalanced settings.
Score: 10.014356492742074
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Positive-unlabeled learning (PUL) aims at learning a binary classifier from only positive and unlabeled training data. Even though real-world applications often involve imbalanced datasets where the majority of examples belong to one class, most contemporary approaches to PUL do not investigate performance in this setting, thus severely limiting their applicability in practice. In this work, we thus propose to tackle the issues of imbalanced datasets and model calibration in a PUL setting through an uncertainty-aware pseudo-labeling procedure (PUUPL): by boosting the signal from the minority class, pseudo-labeling expands the labeled dataset with new samples from the unlabeled set, while explicit uncertainty quantification prevents the emergence of harmful confirmation bias leading to increased predictive performance. Within a series of experiments, PUUPL yields substantial performance gains in highly imbalanced settings while also showing strong performance in balanced PU scenarios across recent baselines. We furthermore provide ablations and sensitivity analyses to shed light on PUUPL's several ingredients. Finally, a real-world application with an imbalanced dataset confirms the advantage of our approach.

Related papers

An Unbiased Risk Estimator for Partial Label Learning with Augmented Classes [46.663081214928226]
We propose an unbiased risk estimator with theoretical guarantees for PLLAC. We provide a theoretical analysis of the estimation error bound of PLLAC. Experiments on benchmark, UCI and real-world datasets demonstrate the effectiveness of the proposed approach.
arXiv Detail & Related papers (2024-09-29T07:36:16Z)
A Channel-ensemble Approach: Unbiased and Low-variance Pseudo-labels is Critical for Semi-supervised Classification [61.473485511491795]
Semi-supervised learning (SSL) is a practical challenge in computer vision. Pseudo-label (PL) methods, e.g., FixMatch and FreeMatch, obtain the State Of The Art (SOTA) performances in SSL. We propose a lightweight channel-based ensemble method to consolidate multiple inferior PLs into the theoretically guaranteed unbiased and low-variance one.
arXiv Detail & Related papers (2024-03-27T09:49:37Z)
Learning with Imbalanced Noisy Data by Preventing Bias in Sample Selection [82.43311784594384]
Real-world datasets contain not only noisy labels but also class imbalance. We propose a simple yet effective method to address noisy labels in imbalanced datasets.
arXiv Detail & Related papers (2024-02-17T10:34:53Z)
CLAF: Contrastive Learning with Augmented Features for Imbalanced Semi-Supervised Learning [40.5117833362268]
Semi-supervised learning and contrastive learning have been progressively combined to achieve better performances in popular applications. One common manner is assigning pseudo-labels to unlabeled samples and selecting positive and negative samples from pseudo-labeled samples to apply contrastive learning. We propose Contrastive Learning with Augmented Features (CLAF) to alleviate the scarcity of minority class samples in contrastive learning.
arXiv Detail & Related papers (2023-12-15T08:27:52Z)
Beyond Myopia: Learning from Positive and Unlabeled Data through Holistic Predictive Trends [26.79150786180822]
We unveil an intriguing yet long-overlooked observation in PUL. Predictive trends for positive and negative classes display distinctly different patterns. We propose a novel TPP-inspired measure for trend detection and prove its unbiasedness in predicting changes.
arXiv Detail & Related papers (2023-10-06T08:06:15Z)
SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised Learning [101.86916775218403]
This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation. We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training. In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
arXiv Detail & Related papers (2023-01-26T03:53:25Z)
Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels. We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z)
In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning [53.1047775185362]
Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its original formulation. We argue that PL underperforms due to the erroneous high confidence predictions from poorly calibrated models. We propose an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process.
arXiv Detail & Related papers (2021-01-15T23:29:57Z)
Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning [126.31716228319902]
We develop Distribution Aligning Refinery of Pseudo-label (DARP) algorithm. We show that DARP is provably and efficiently compatible with state-of-the-art SSL schemes.
arXiv Detail & Related papers (2020-07-17T09:16:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.