Risk bounds for PU learning under Selected At Random assumption
        - URL: http://arxiv.org/abs/2201.06277v1
- Date: Mon, 17 Jan 2022 08:45:39 GMT
- Title: Risk bounds for PU learning under Selected At Random assumption
- Authors: Olivier Coudray (CELESTE), Christine Keribin (CELESTE), Pascal Massart
  (CELESTE), Patrick Pamphile (CELESTE)
- Abstract summary: Positive-unlabeled learning (PU learning) is known as a special case of semi-supervised binary classification where only a fraction of positive examples are labeled.
We provide a lower bound on minimax risk proving that the upper bound is almost optimal.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Positive-unlabeled learning (PU learning) is known as a special case of
semi-supervised binary classification where only a fraction of positive
examples are labeled. The challenge is then to find the correct classifier
despite this lack of information. Recently, new methodologies have been
introduced to address the case where the probability of being labeled may
depend on the covariates. In this paper, we are interested in establishing risk
bounds for PU learning under this general assumption. In addition, we quantify
the impact of label noise on PU learning compared to standard classification
setting. Finally, we provide a lower bound on minimax risk proving that the
upper bound is almost optimal.
 
      
        Related papers
        - Learning from positive and unlabeled examples -Finite size sample bounds [5.015294654768579]
 PU learning is a variant of supervised classification learning in which the only labels revealed to the learner are of positively labeled instances.<n>This paper provides a theoretical analysis of the statistical complexity of PU learning under a wider range of setups.
 arXiv  Detail & Related papers  (2025-07-10T00:39:40Z)
- ScarceGAN: Discriminative Classification Framework for Rare Class   Identification for Longitudinal Data with Weak Prior [4.2944491746735745]
 ScarceGAN focuses on identification of extremely rare or scarce samples from longitudinal telemetry data with small and weak label prior.<n>We specifically address: (i) severe scarcity in positive class, stemming from both underlying organic skew in the data, as well as extremely limited labels.<n>For identifying risky players in skill gaming, this formulation in whole gives us a recall of over 85% (60% jump over vanilla semi-supervised GAN) on our scarce class with very minimal verbosity in the unknown space.
 arXiv  Detail & Related papers  (2025-05-02T12:17:37Z)
- An Unbiased Risk Estimator for Partial Label Learning with Augmented   Classes [46.663081214928226]
 We propose an unbiased risk estimator with theoretical guarantees for PLLAC.
We provide a theoretical analysis of the estimation error bound of PLLAC.
Experiments on benchmark, UCI and real-world datasets demonstrate the effectiveness of the proposed approach.
 arXiv  Detail & Related papers  (2024-09-29T07:36:16Z)
- Verifying the Selected Completely at Random Assumption in   Positive-Unlabeled Learning [0.7646713951724013]
 We propose a relatively simple and computationally fast test that can be used to determine whether the observed data meet the SCAR assumption.
Our test is based on generating artificial labels conforming to the SCAR case, which in turn allows to mimic the distribution of the test statistic under the null hypothesis of SCAR.
 arXiv  Detail & Related papers  (2024-03-29T20:36:58Z)
- Understanding Contrastive Representation Learning from Positive   Unlabeled (PU) Data [28.74519165747641]
 We study the problem of Positive Unlabeled (PU) learning, where only a small set of labeled positives and a large unlabeled pool are available.
We introduce Positive Unlabeled Contrastive Learning (puCL), an unbiased and variance reducing contrastive objective.
When the class prior is known, we propose Positive Unlabeled InfoNCE (puNCE), a prior-aware extension that re-weights unlabeled samples as soft positive negative mixtures.
 arXiv  Detail & Related papers  (2024-02-08T20:20:54Z)
- Learning with Complementary Labels Revisited: The   Selected-Completely-at-Random Setting Is More Practical [66.57396042747706]
 Complementary-label learning is a weakly supervised learning problem.
We propose a consistent approach that does not rely on the uniform distribution assumption.
We find that complementary-label learning can be expressed as a set of negative-unlabeled binary classification problems.
 arXiv  Detail & Related papers  (2023-11-27T02:59:17Z)
- Can Class-Priors Help Single-Positive Multi-Label Learning? [40.312419865957224]
 Single-positive multi-label learning (SPMLL) is a typical weakly supervised multi-label learning problem.
Class-priors estimator is introduced, which could estimate the class-priors that are theoretically guaranteed to converge to the ground-truth class-priors.
Based on the estimated class-priors, an unbiased risk estimator for classification is derived, and the corresponding risk minimizer could be guaranteed to approximately converge to the optimal risk minimizer on fully supervised data.
 arXiv  Detail & Related papers  (2023-09-25T05:45:57Z)
- Positive Unlabeled Learning Selected Not At Random (PULSNAR): class   proportion estimation when the SCAR assumption does not hold [2.76815720120527]
 Positive and Unlabeled (PU) learning is a type of semi-supervised binary classification.
PU learning has broad applications in settings where confirmed negatives are unavailable or difficult to obtain.
We propose two PU learning algorithms to estimate $alpha$, calculate probabilities for PU instances, and improve classification metrics.
 arXiv  Detail & Related papers  (2023-03-14T23:16:22Z)
- Dist-PU: Positive-Unlabeled Learning from a Label Distribution
  Perspective [89.5370481649529]
 We propose a label distribution perspective for PU learning in this paper.
Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions.
 Experiments on three benchmark datasets validate the effectiveness of the proposed method.
 arXiv  Detail & Related papers  (2022-12-06T07:38:29Z)
- Learning from Multiple Unlabeled Datasets with Partial Risk
  Regularization [80.54710259664698]
 In this paper, we aim to learn an accurate classifier without any class labels.
We first derive an unbiased estimator of the classification risk that can be estimated from the given unlabeled sets.
We then find that the classifier obtained as such tends to cause overfitting as its empirical risks go negative during training.
 Experiments demonstrate that our method effectively mitigates overfitting and outperforms state-of-the-art methods for learning from multiple unlabeled sets.
 arXiv  Detail & Related papers  (2022-07-04T16:22:44Z)
- Positive Unlabeled Contrastive Learning [14.975173394072053]
 We extend the self-supervised pretraining paradigm to the classical positive unlabeled (PU) setting.
We develop a simple methodology to pseudo-label the unlabeled samples using a new PU-specific clustering scheme.
Our method handily outperforms state-of-the-art PU methods over several standard PU benchmark datasets.
 arXiv  Detail & Related papers  (2022-06-01T20:16:32Z)
- Learning with Proper Partial Labels [87.65718705642819]
 Partial-label learning is a kind of weakly-supervised learning with inexact labels.
We show that this proper partial-label learning framework includes many previous partial-label learning settings.
We then derive a unified unbiased estimator of the classification risk.
 arXiv  Detail & Related papers  (2021-12-23T01:37:03Z)
- RATT: Leveraging Unlabeled Data to Guarantee Generalization [96.08979093738024]
 We introduce a method that leverages unlabeled data to produce generalization bounds.
We prove that our bound is valid for 0-1 empirical risk minimization.
This work provides practitioners with an option for certifying the generalization of deep nets even when unseen labeled data is unavailable.
 arXiv  Detail & Related papers  (2021-05-01T17:05:29Z)
- Cautious Active Clustering [79.23797234241471]
 We consider the problem of classification of points sampled from an unknown probability measure on a Euclidean space.
Our approach is to consider the unknown probability measure as a convex combination of the conditional probabilities for each class.
 arXiv  Detail & Related papers  (2020-08-03T23:47:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.