Learning from Positive and Unlabeled Data with Arbitrary Positive Shift
- URL: http://arxiv.org/abs/2002.10261v4
- Date: Mon, 9 Nov 2020 12:20:05 GMT
- Title: Learning from Positive and Unlabeled Data with Arbitrary Positive Shift
- Authors: Zayd Hammoudeh and Daniel Lowd
- Abstract summary: This paper shows that PU learning is possible even with arbitrarily non-representative positive data given unlabeled data.
We integrate this into two statistically consistent methods to address arbitrary positive bias.
Experimental results demonstrate our methods' effectiveness across numerous real-world datasets.
- Score: 11.663072799764542
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Positive-unlabeled (PU) learning trains a binary classifier using only
positive and unlabeled data. A common simplifying assumption is that the
positive data is representative of the target positive class. This assumption
rarely holds in practice due to temporal drift, domain shift, and/or
adversarial manipulation. This paper shows that PU learning is possible even
with arbitrarily non-representative positive data given unlabeled data from the
source and target distributions. Our key insight is that only the negative
class's distribution need be fixed. We integrate this into two statistically
consistent methods to address arbitrary positive bias - one approach combines
negative-unlabeled learning with unlabeled-unlabeled learning while the other
uses a novel, recursive risk estimator. Experimental results demonstrate our
methods' effectiveness across numerous real-world datasets and forms of
positive bias, including disjoint positive class-conditional supports.
Additionally, we propose a general, simplified approach to address PU risk
estimation overfitting.
Related papers
- PUAL: A Classifier on Trifurcate Positive-Unlabeled Data [29.617810881312867]
We propose a PU classifier with asymmetric loss (PUAL)
We develop a kernel-based algorithm to enable PUAL to obtain non-linear decision boundary.
We show that, through experiments on both simulated and real-world datasets, PUAL can achieve satisfactory classification on trifurcate data.
arXiv Detail & Related papers (2024-05-31T16:18:06Z) - Contrastive Learning with Negative Sampling Correction [52.990001829393506]
We propose a novel contrastive learning method named Positive-Unlabeled Contrastive Learning (PUCL)
PUCL treats the generated negative samples as unlabeled samples and uses information from positive samples to correct bias in contrastive loss.
PUCL can be applied to general contrastive learning problems and outperforms state-of-the-art methods on various image and graph classification tasks.
arXiv Detail & Related papers (2024-01-13T11:18:18Z) - Joint empirical risk minimization for instance-dependent
positive-unlabeled data [4.112909937203119]
Learning from positive and unlabeled data (PU learning) is actively researched machine learning task.
The goal is to train a binary classification model based on a dataset containing part on positives which are labeled, and unlabeled instances.
Unlabeled set includes remaining part positives and all negative observations.
arXiv Detail & Related papers (2023-12-27T12:45:12Z) - Learning with Complementary Labels Revisited: The Selected-Completely-at-Random Setting Is More Practical [66.57396042747706]
Complementary-label learning is a weakly supervised learning problem.
We propose a consistent approach that does not rely on the uniform distribution assumption.
We find that complementary-label learning can be expressed as a set of negative-unlabeled binary classification problems.
arXiv Detail & Related papers (2023-11-27T02:59:17Z) - Robust Positive-Unlabeled Learning via Noise Negative Sample
Self-correction [48.929877651182885]
Learning from positive and unlabeled data is known as positive-unlabeled (PU) learning in literature.
We propose a new robust PU learning method with a training strategy motivated by the nature of human learning.
arXiv Detail & Related papers (2023-08-01T04:34:52Z) - Automatic Debiased Learning from Positive, Unlabeled, and Exposure Data [11.217084610985674]
We address the issue of binary classification from positive and unlabeled data (PU classification) with a selection bias in the positive data.
This scenario represents a conceptual framework for many practical applications, such as recommender systems.
We propose a method to identify the function of interest using a strong ignorability assumption and develop an Automatic Debiased PUE'' (ADPUE) learning method.
arXiv Detail & Related papers (2023-03-08T18:45:22Z) - Dist-PU: Positive-Unlabeled Learning from a Label Distribution
Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper.
Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions.
Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z) - Classification from Positive and Biased Negative Data with Skewed
Labeled Posterior Probability [0.0]
We propose a new method to approach the positive and biased negative (PbN) classification problem.
We incorporate a method to correct the negative impact due to skewed confidence, which represents the posterior probability that the observed data are positive.
arXiv Detail & Related papers (2022-03-11T04:31:35Z) - A Novel Perspective for Positive-Unlabeled Learning via Noisy Labels [49.990938653249415]
This research presents a methodology that assigns initial pseudo-labels to unlabeled data which is used as noisy-labeled data, and trains a deep neural network using the noisy-labeled data.
Experimental results demonstrate that the proposed method significantly outperforms the state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2021-03-08T11:46:02Z) - MixPUL: Consistency-based Augmentation for Positive and Unlabeled
Learning [8.7382177147041]
We propose a simple yet effective data augmentation method, coinedalgo, based on emphconsistency regularization.
algoincorporates supervised and unsupervised consistency training to generate augmented data.
We show thatalgoachieves an averaged improvement of classification error from 16.49 to 13.09 on the CIFAR-10 dataset across different positive data amount.
arXiv Detail & Related papers (2020-04-20T15:43:33Z) - On Positive-Unlabeled Classification in GAN [130.43248168149432]
This paper defines a positive and unlabeled classification problem for standard GANs.
It then leads to a novel technique to stabilize the training of the discriminator in GANs.
arXiv Detail & Related papers (2020-02-04T05:59:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.