Robust Positive-Unlabeled Learning via Noise Negative Sample
Self-correction
- URL: http://arxiv.org/abs/2308.00279v1
- Date: Tue, 1 Aug 2023 04:34:52 GMT
- Title: Robust Positive-Unlabeled Learning via Noise Negative Sample
Self-correction
- Authors: Zhangchi Zhu, Lu Wang, Pu Zhao, Chao Du, Wei Zhang, Hang Dong, Bo
Qiao, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang
- Abstract summary: Learning from positive and unlabeled data is known as positive-unlabeled (PU) learning in literature.
We propose a new robust PU learning method with a training strategy motivated by the nature of human learning.
- Score: 48.929877651182885
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning from positive and unlabeled data is known as positive-unlabeled (PU)
learning in literature and has attracted much attention in recent years. One
common approach in PU learning is to sample a set of pseudo-negatives from the
unlabeled data using ad-hoc thresholds so that conventional supervised methods
can be applied with both positive and negative samples. Owing to the label
uncertainty among the unlabeled data, errors of misclassifying unlabeled
positive samples as negative samples inevitably appear and may even accumulate
during the training processes. Those errors often lead to performance
degradation and model instability. To mitigate the impact of label uncertainty
and improve the robustness of learning with positive and unlabeled data, we
propose a new robust PU learning method with a training strategy motivated by
the nature of human learning: easy cases should be learned first. Similar
intuition has been utilized in curriculum learning to only use easier cases in
the early stage of training before introducing more complex cases.
Specifically, we utilize a novel ``hardness'' measure to distinguish unlabeled
samples with a high chance of being negative from unlabeled samples with large
label noise. An iterative training strategy is then implemented to fine-tune
the selection of negative samples during the training process in an iterative
manner to include more ``easy'' samples in the early stage of training.
Extensive experimental validations over a wide range of learning tasks show
that this approach can effectively improve the accuracy and stability of
learning with positive and unlabeled data. Our code is available at
https://github.com/woriazzc/Robust-PU
Related papers
- Importance of negative sampling in weak label learning [33.97406573051897]
Weak-label learning is a challenging task that requires learning from data "bags" containing positive and negative instances.
We study several sampling strategies that can measure the usefulness of negative instances for weak-label learning and select them accordingly.
Our work reveals that negative instances are not all equally irrelevant, and selecting them wisely can benefit weak-label learning.
arXiv Detail & Related papers (2023-09-23T01:11:15Z) - Late Stopping: Avoiding Confidently Learning from Mislabeled Examples [61.00103151680946]
We propose a new framework, Late Stopping, which leverages the intrinsic robust learning ability of DNNs through a prolonged training process.
We empirically observe that mislabeled and clean examples exhibit differences in the number of epochs required for them to be consistently and correctly classified.
Experimental results on benchmark-simulated and real-world noisy datasets demonstrate that the proposed method outperforms state-of-the-art counterparts.
arXiv Detail & Related papers (2023-08-26T12:43:25Z) - Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and
Uncurated Unlabeled Data [70.25049762295193]
We introduce a novel conditional image generation framework that accepts noisy-labeled and uncurated data during training.
We propose soft curriculum learning, which assigns instance-wise weights for adversarial training while assigning new labels for unlabeled data.
Our experiments show that our approach outperforms existing semi-supervised and label-noise robust methods in terms of both quantitative and qualitative performance.
arXiv Detail & Related papers (2023-07-17T08:31:59Z) - Dist-PU: Positive-Unlabeled Learning from a Label Distribution
Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper.
Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions.
Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z) - Positive Unlabeled Contrastive Learning [14.975173394072053]
We extend the self-supervised pretraining paradigm to the classical positive unlabeled (PU) setting.
We develop a simple methodology to pseudo-label the unlabeled samples using a new PU-specific clustering scheme.
Our method handily outperforms state-of-the-art PU methods over several standard PU benchmark datasets.
arXiv Detail & Related papers (2022-06-01T20:16:32Z) - Adaptive Positive-Unlabelled Learning via Markov Diffusion [0.0]
Positive-Unlabelled (PU) learning is the machine learning setting in which only a set of positive instances are labelled.
The principal aim of the algorithm is to identify a set of instances which are likely to contain positive instances that were originally unlabelled.
arXiv Detail & Related papers (2021-08-13T10:25:47Z) - A Novel Perspective for Positive-Unlabeled Learning via Noisy Labels [49.990938653249415]
This research presents a methodology that assigns initial pseudo-labels to unlabeled data which is used as noisy-labeled data, and trains a deep neural network using the noisy-labeled data.
Experimental results demonstrate that the proposed method significantly outperforms the state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2021-03-08T11:46:02Z) - In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label
Selection Framework for Semi-Supervised Learning [53.1047775185362]
Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its original formulation.
We argue that PL underperforms due to the erroneous high confidence predictions from poorly calibrated models.
We propose an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process.
arXiv Detail & Related papers (2021-01-15T23:29:57Z) - Improving Positive Unlabeled Learning: Practical AUL Estimation and New
Training Method for Extremely Imbalanced Data Sets [10.870831090350402]
We improve Positive Unlabeled (PU) learning over state-of-the-art from two aspects.
First, we propose an unbiased practical AUL estimation method, which makes use of raw PU data without prior knowledge of unlabeled samples.
Secondly, we propose ProbTagging, a new training method for extremely imbalanced data sets.
arXiv Detail & Related papers (2020-04-21T08:32:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.