Risk bounds for PU learning under Selected At Random assumption
- URL: http://arxiv.org/abs/2201.06277v1
- Date: Mon, 17 Jan 2022 08:45:39 GMT
- Title: Risk bounds for PU learning under Selected At Random assumption
- Authors: Olivier Coudray (CELESTE), Christine Keribin (CELESTE), Pascal Massart
(CELESTE), Patrick Pamphile (CELESTE)
- Abstract summary: Positive-unlabeled learning (PU learning) is known as a special case of semi-supervised binary classification where only a fraction of positive examples are labeled.
We provide a lower bound on minimax risk proving that the upper bound is almost optimal.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Positive-unlabeled learning (PU learning) is known as a special case of
semi-supervised binary classification where only a fraction of positive
examples are labeled. The challenge is then to find the correct classifier
despite this lack of information. Recently, new methodologies have been
introduced to address the case where the probability of being labeled may
depend on the covariates. In this paper, we are interested in establishing risk
bounds for PU learning under this general assumption. In addition, we quantify
the impact of label noise on PU learning compared to standard classification
setting. Finally, we provide a lower bound on minimax risk proving that the
upper bound is almost optimal.
Related papers
- An Unbiased Risk Estimator for Partial Label Learning with Augmented Classes [46.663081214928226]
We propose an unbiased risk estimator with theoretical guarantees for PLLAC.
We provide a theoretical analysis of the estimation error bound of PLLAC.
Experiments on benchmark, UCI and real-world datasets demonstrate the effectiveness of the proposed approach.
arXiv Detail & Related papers (2024-09-29T07:36:16Z) - Verifying the Selected Completely at Random Assumption in Positive-Unlabeled Learning [0.7646713951724013]
We propose a relatively simple and computationally fast test that can be used to determine whether the observed data meet the SCAR assumption.
Our test is based on generating artificial labels conforming to the SCAR case, which in turn allows to mimic the distribution of the test statistic under the null hypothesis of SCAR.
arXiv Detail & Related papers (2024-03-29T20:36:58Z) - Learning with Complementary Labels Revisited: The Selected-Completely-at-Random Setting Is More Practical [66.57396042747706]
Complementary-label learning is a weakly supervised learning problem.
We propose a consistent approach that does not rely on the uniform distribution assumption.
We find that complementary-label learning can be expressed as a set of negative-unlabeled binary classification problems.
arXiv Detail & Related papers (2023-11-27T02:59:17Z) - Can Class-Priors Help Single-Positive Multi-Label Learning? [40.312419865957224]
Single-positive multi-label learning (SPMLL) is a typical weakly supervised multi-label learning problem.
Class-priors estimator is introduced, which could estimate the class-priors that are theoretically guaranteed to converge to the ground-truth class-priors.
Based on the estimated class-priors, an unbiased risk estimator for classification is derived, and the corresponding risk minimizer could be guaranteed to approximately converge to the optimal risk minimizer on fully supervised data.
arXiv Detail & Related papers (2023-09-25T05:45:57Z) - Positive Unlabeled Learning Selected Not At Random (PULSNAR): class proportion estimation when the SCAR assumption does not hold [2.76815720120527]
Positive and Unlabeled (PU) learning is a type of semi-supervised binary classification.
PU learning has broad applications in settings where confirmed negatives are unavailable or difficult to obtain.
We propose two PU learning algorithms to estimate $alpha$, calculate probabilities for PU instances, and improve classification metrics.
arXiv Detail & Related papers (2023-03-14T23:16:22Z) - Dist-PU: Positive-Unlabeled Learning from a Label Distribution
Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper.
Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions.
Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z) - Learning from Multiple Unlabeled Datasets with Partial Risk
Regularization [80.54710259664698]
In this paper, we aim to learn an accurate classifier without any class labels.
We first derive an unbiased estimator of the classification risk that can be estimated from the given unlabeled sets.
We then find that the classifier obtained as such tends to cause overfitting as its empirical risks go negative during training.
Experiments demonstrate that our method effectively mitigates overfitting and outperforms state-of-the-art methods for learning from multiple unlabeled sets.
arXiv Detail & Related papers (2022-07-04T16:22:44Z) - Positive Unlabeled Contrastive Learning [14.975173394072053]
We extend the self-supervised pretraining paradigm to the classical positive unlabeled (PU) setting.
We develop a simple methodology to pseudo-label the unlabeled samples using a new PU-specific clustering scheme.
Our method handily outperforms state-of-the-art PU methods over several standard PU benchmark datasets.
arXiv Detail & Related papers (2022-06-01T20:16:32Z) - Learning with Proper Partial Labels [87.65718705642819]
Partial-label learning is a kind of weakly-supervised learning with inexact labels.
We show that this proper partial-label learning framework includes many previous partial-label learning settings.
We then derive a unified unbiased estimator of the classification risk.
arXiv Detail & Related papers (2021-12-23T01:37:03Z) - RATT: Leveraging Unlabeled Data to Guarantee Generalization [96.08979093738024]
We introduce a method that leverages unlabeled data to produce generalization bounds.
We prove that our bound is valid for 0-1 empirical risk minimization.
This work provides practitioners with an option for certifying the generalization of deep nets even when unseen labeled data is unavailable.
arXiv Detail & Related papers (2021-05-01T17:05:29Z) - Cautious Active Clustering [79.23797234241471]
We consider the problem of classification of points sampled from an unknown probability measure on a Euclidean space.
Our approach is to consider the unknown probability measure as a convex combination of the conditional probabilities for each class.
arXiv Detail & Related papers (2020-08-03T23:47:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.