Robust and Efficient Imbalanced Positive-Unlabeled Learning with
Self-supervision
- URL: http://arxiv.org/abs/2209.02459v1
- Date: Tue, 6 Sep 2022 12:54:59 GMT
- Title: Robust and Efficient Imbalanced Positive-Unlabeled Learning with
Self-supervision
- Authors: Emilio Dorigatti, Jonas Schweisthal, Bernd Bischl, Mina Rezaei
- Abstract summary: We present textitImPULSeS, a unified representation learning framework for underlineImbalanced underlinePositive underlineUnlabeled underlineLearning.
We performed different experiments across multiple datasets to show that ImPULSeS is able to halve the error rate of the previous state-of-the-art.
- Score: 1.5675763601034223
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning from positive and unlabeled (PU) data is a setting where the learner
only has access to positive and unlabeled samples while having no information
on negative examples. Such PU setting is of great importance in various tasks
such as medical diagnosis, social network analysis, financial markets analysis,
and knowledge base completion, which also tend to be intrinsically imbalanced,
i.e., where most examples are actually negatives. Most existing approaches for
PU learning, however, only consider artificially balanced datasets and it is
unclear how well they perform in the realistic scenario of imbalanced and
long-tail data distribution. This paper proposes to tackle this challenge via
robust and efficient self-supervised pretraining. However, training
conventional self-supervised learning methods when applied with highly
imbalanced PU distribution needs better reformulation. In this paper, we
present \textit{ImPULSeS}, a unified representation learning framework for
\underline{Im}balanced \underline{P}ositive \underline{U}nlabeled
\underline{L}earning leveraging \underline{Se}lf-\underline{S}upervised debiase
pre-training. ImPULSeS uses a generic combination of large-scale unsupervised
learning with debiased contrastive loss and additional reweighted PU loss. We
performed different experiments across multiple datasets to show that ImPULSeS
is able to halve the error rate of the previous state-of-the-art, even compared
with previous methods that are given the true prior. Moreover, our method
showed increased robustness to prior misspecification and superior performance
even when pretraining was performed on an unrelated dataset. We anticipate such
robustness and efficiency will make it much easier for practitioners to obtain
excellent results on other PU datasets of interest. The source code is
available at \url{https://github.com/JSchweisthal/ImPULSeS}
Related papers
- Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples.
Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance.
We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z) - Ask Your Distribution Shift if Pre-Training is Right for You [74.18516460467019]
In practice, fine-tuning a pre-trained model improves robustness significantly in some cases but not at all in others.
We focus on two possible failure modes of models under distribution shift: poor extrapolation and biases in the training data.
Our study suggests that, as a rule of thumb, pre-training can help mitigate poor extrapolation but not dataset biases.
arXiv Detail & Related papers (2024-02-29T23:46:28Z) - Dr. FERMI: A Stochastic Distributionally Robust Fair Empirical Risk
Minimization Framework [12.734559823650887]
In the presence of distribution shifts, fair machine learning models may behave unfairly on test data.
Existing algorithms require full access to data and cannot be used when small batches are used.
This paper proposes the first distributionally robust fairness framework with convergence guarantees that do not require knowledge of the causal graph.
arXiv Detail & Related papers (2023-09-20T23:25:28Z) - Uncertainty Voting Ensemble for Imbalanced Deep Regression [20.176217123752465]
In this paper, we introduce UVOTE, a method for learning from imbalanced data.
We replace traditional regression losses with negative log-likelihood, which also predicts sample-wise aleatoric uncertainty.
We show that UVOTE consistently outperforms the prior art, while at the same time producing better-calibrated uncertainty estimates.
arXiv Detail & Related papers (2023-05-24T14:12:21Z) - An Embarrassingly Simple Baseline for Imbalanced Semi-Supervised
Learning [103.65758569417702]
Semi-supervised learning (SSL) has shown great promise in leveraging unlabeled data to improve model performance.
We consider a more realistic and challenging setting called imbalanced SSL, where imbalanced class distributions occur in both labeled and unlabeled data.
We study a simple yet overlooked baseline -- SimiS -- which tackles data imbalance by simply supplementing labeled data with pseudo-labels.
arXiv Detail & Related papers (2022-11-20T21:18:41Z) - Agree to Disagree: Diversity through Disagreement for Better
Transferability [54.308327969778155]
We propose D-BAT (Diversity-By-disAgreement Training), which enforces agreement among the models on the training data.
We show how D-BAT naturally emerges from the notion of generalized discrepancy.
arXiv Detail & Related papers (2022-02-09T12:03:02Z) - Relieving Long-tailed Instance Segmentation via Pairwise Class Balance [85.53585498649252]
Long-tailed instance segmentation is a challenging task due to the extreme imbalance of training samples among classes.
It causes severe biases of the head classes (with majority samples) against the tailed ones.
We propose a novel Pairwise Class Balance (PCB) method, built upon a confusion matrix which is updated during training to accumulate the ongoing prediction preferences.
arXiv Detail & Related papers (2022-01-08T07:48:36Z) - Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning.
We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class.
We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z) - Improving Positive Unlabeled Learning: Practical AUL Estimation and New
Training Method for Extremely Imbalanced Data Sets [10.870831090350402]
We improve Positive Unlabeled (PU) learning over state-of-the-art from two aspects.
First, we propose an unbiased practical AUL estimation method, which makes use of raw PU data without prior knowledge of unlabeled samples.
Secondly, we propose ProbTagging, a new training method for extremely imbalanced data sets.
arXiv Detail & Related papers (2020-04-21T08:32:57Z) - MixPUL: Consistency-based Augmentation for Positive and Unlabeled
Learning [8.7382177147041]
We propose a simple yet effective data augmentation method, coinedalgo, based on emphconsistency regularization.
algoincorporates supervised and unsupervised consistency training to generate augmented data.
We show thatalgoachieves an averaged improvement of classification error from 16.49 to 13.09 on the CIFAR-10 dataset across different positive data amount.
arXiv Detail & Related papers (2020-04-20T15:43:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.