Beyond Myopia: Learning from Positive and Unlabeled Data through
Holistic Predictive Trends
- URL: http://arxiv.org/abs/2310.04078v1
- Date: Fri, 6 Oct 2023 08:06:15 GMT
- Title: Beyond Myopia: Learning from Positive and Unlabeled Data through
Holistic Predictive Trends
- Authors: Xinrui Wang and Wenhai Wan and Chuanxin Geng and Shaoyuan LI and
Songcan Chen
- Abstract summary: We unveil an intriguing yet long-overlooked observation in PUL.
Predictive trends for positive and negative classes display distinctly different patterns.
We propose a novel TPP-inspired measure for trend detection and prove its unbiasedness in predicting changes.
- Score: 26.79150786180822
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning binary classifiers from positive and unlabeled data (PUL) is vital
in many real-world applications, especially when verifying negative examples is
difficult. Despite the impressive empirical performance of recent PUL methods,
challenges like accumulated errors and increased estimation bias persist due to
the absence of negative labels. In this paper, we unveil an intriguing yet
long-overlooked observation in PUL: \textit{resampling the positive data in
each training iteration to ensure a balanced distribution between positive and
unlabeled examples results in strong early-stage performance. Furthermore,
predictive trends for positive and negative classes display distinctly
different patterns.} Specifically, the scores (output probability) of unlabeled
negative examples consistently decrease, while those of unlabeled positive
examples show largely chaotic trends. Instead of focusing on classification
within individual time frames, we innovatively adopt a holistic approach,
interpreting the scores of each example as a temporal point process (TPP). This
reformulates the core problem of PUL as recognizing trends in these scores. We
then propose a novel TPP-inspired measure for trend detection and prove its
asymptotic unbiasedness in predicting changes. Notably, our method accomplishes
PUL without requiring additional parameter tuning or prior assumptions,
offering an alternative perspective for tackling this problem. Extensive
experiments verify the superiority of our method, particularly in a highly
imbalanced real-world setting, where it achieves improvements of up to $11.3\%$
in key metrics. The code is available at
\href{https://github.com/wxr99/HolisticPU}{https://github.com/wxr99/HolisticPU}.
Related papers
- Typicalness-Aware Learning for Failure Detection [26.23185979968123]
Deep neural networks (DNNs) often suffer from the overconfidence issue, where incorrect predictions are made with high confidence scores.
We propose a novel approach called Typicalness-Aware Learning (TAL) to address this issue and improve failure detection performance.
arXiv Detail & Related papers (2024-11-04T11:09:47Z) - Contrastive Learning with Negative Sampling Correction [52.990001829393506]
We propose a novel contrastive learning method named Positive-Unlabeled Contrastive Learning (PUCL)
PUCL treats the generated negative samples as unlabeled samples and uses information from positive samples to correct bias in contrastive loss.
PUCL can be applied to general contrastive learning problems and outperforms state-of-the-art methods on various image and graph classification tasks.
arXiv Detail & Related papers (2024-01-13T11:18:18Z) - Late Stopping: Avoiding Confidently Learning from Mislabeled Examples [61.00103151680946]
We propose a new framework, Late Stopping, which leverages the intrinsic robust learning ability of DNNs through a prolonged training process.
We empirically observe that mislabeled and clean examples exhibit differences in the number of epochs required for them to be consistently and correctly classified.
Experimental results on benchmark-simulated and real-world noisy datasets demonstrate that the proposed method outperforms state-of-the-art counterparts.
arXiv Detail & Related papers (2023-08-26T12:43:25Z) - Robust Positive-Unlabeled Learning via Noise Negative Sample
Self-correction [48.929877651182885]
Learning from positive and unlabeled data is known as positive-unlabeled (PU) learning in literature.
We propose a new robust PU learning method with a training strategy motivated by the nature of human learning.
arXiv Detail & Related papers (2023-08-01T04:34:52Z) - Fairness Improves Learning from Noisily Labeled Long-Tailed Data [119.0612617460727]
Long-tailed and noisily labeled data frequently appear in real-world applications and impose significant challenges for learning.
We introduce the Fairness Regularizer (FR), inspired by regularizing the performance gap between any two sub-populations.
We show that the introduced fairness regularizer improves the performances of sub-populations on the tail and the overall learning performance.
arXiv Detail & Related papers (2023-03-22T03:46:51Z) - Dist-PU: Positive-Unlabeled Learning from a Label Distribution
Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper.
Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions.
Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z) - Uncertainty-aware Pseudo-label Selection for Positive-Unlabeled Learning [10.014356492742074]
We propose to tackle the issues of imbalanced datasets and model calibration in a positive-unlabeled learning setting.
By boosting the signal from the minority class, pseudo-labeling expands the labeled dataset with new samples from the unlabeled set.
Within a series of experiments, PUUPL yields substantial performance gains in highly imbalanced settings.
arXiv Detail & Related papers (2022-01-31T12:55:47Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z) - Collective Loss Function for Positive and Unlabeled Learning [19.058269616452545]
We propose a Collectively loss function to learn from only Positive and Unlabeled data.
Results show that cPU consistently outperforms the current state-of-the-art PU learning methods.
arXiv Detail & Related papers (2020-05-06T03:30:22Z) - MixPUL: Consistency-based Augmentation for Positive and Unlabeled
Learning [8.7382177147041]
We propose a simple yet effective data augmentation method, coinedalgo, based on emphconsistency regularization.
algoincorporates supervised and unsupervised consistency training to generate augmented data.
We show thatalgoachieves an averaged improvement of classification error from 16.49 to 13.09 on the CIFAR-10 dataset across different positive data amount.
arXiv Detail & Related papers (2020-04-20T15:43:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.