Related papers: Mixture Proportion Estimation and PU Learning: A Modern Approach

Mixture Proportion Estimation and PU Learning: A Modern Approach

URL: http://arxiv.org/abs/2111.00980v1
Date: Mon, 1 Nov 2021 14:42:23 GMT
Title: Mixture Proportion Estimation and PU Learning: A Modern Approach
Authors: Saurabh Garg, Yifan Wu, Alex Smola, Sivaraman Balakrishnan, Zachary C. Lipton
Abstract summary: Given only positive examples and unlabeled examples, we might hope to estimate an accurate positive-versus-negative classifier. classical methods for both problems break down in high-dimensional settings. We propose two simple techniques: Best Bin Estimation (BBE) and Value Ignoring Risk (CVIR)
Score: 47.34499672878859
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Given only positive examples and unlabeled examples (from both positive and negative classes), we might hope nevertheless to estimate an accurate positive-versus-negative classifier. Formally, this task is broken down into two subtasks: (i) Mixture Proportion Estimation (MPE) -- determining the fraction of positive examples in the unlabeled data; and (ii) PU-learning -- given such an estimate, learning the desired positive-versus-negative classifier. Unfortunately, classical methods for both problems break down in high-dimensional settings. Meanwhile, recently proposed heuristics lack theoretical coherence and depend precariously on hyperparameter tuning. In this paper, we propose two simple techniques: Best Bin Estimation (BBE) (for MPE); and Conditional Value Ignoring Risk (CVIR), a simple objective for PU-learning. Both methods dominate previous approaches empirically, and for BBE, we establish formal guarantees that hold whenever we can train a model to cleanly separate out a small subset of positive examples. Our final algorithm (TED)$^n$, alternates between the two procedures, significantly improving both our mixture proportion estimator and classifier

Related papers

Probably Approximately Precision and Recall Learning [62.912015491907994]
Precision and Recall are foundational metrics in machine learning. One-sided feedback--where only positive examples are observed during training--is inherent in many practical problems. We introduce a PAC learning framework where each hypothesis is represented by a graph, with edges indicating positive interactions.
arXiv Detail & Related papers (2024-11-20T04:21:07Z)
New Paradigm of Adversarial Training: Releasing Accuracy-Robustness Trade-Off via Dummy Class [10.937432407005124]
Adversarial Training (AT) is one of the most effective methods to enhance the robustness of Deep Neural Networks (DNNs)<n>Existing AT methods suffer from an inherent accuracy-robustness trade-off.<n>We propose a new AT paradigm by introducing an additional dummy class for each original class.
arXiv Detail & Related papers (2024-10-16T15:36:10Z)
Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples. Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance. We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z)
Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class. Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z)
Joint empirical risk minimization for instance-dependent positive-unlabeled data [4.112909937203119]
Learning from positive and unlabeled data (PU learning) is actively researched machine learning task. The goal is to train a binary classification model based on a dataset containing part on positives which are labeled, and unlabeled instances. Unlabeled set includes remaining part positives and all negative observations.
arXiv Detail & Related papers (2023-12-27T12:45:12Z)
Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models. In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints. A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z)
Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem. Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem. We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z)
DEMI: Discriminative Estimator of Mutual Information [5.248805627195347]
Estimating mutual information between continuous random variables is often intractable and challenging for high-dimensional data. Recent progress has leveraged neural networks to optimize variational lower bounds on mutual information. Our approach is based on training a classifier that provides the probability that a data sample pair is drawn from the joint distribution.
arXiv Detail & Related papers (2020-10-05T04:19:27Z)
Improving Positive Unlabeled Learning: Practical AUL Estimation and New Training Method for Extremely Imbalanced Data Sets [10.870831090350402]
We improve Positive Unlabeled (PU) learning over state-of-the-art from two aspects. First, we propose an unbiased practical AUL estimation method, which makes use of raw PU data without prior knowledge of unlabeled samples. Secondly, we propose ProbTagging, a new training method for extremely imbalanced data sets.
arXiv Detail & Related papers (2020-04-21T08:32:57Z)
Learning from Positive and Unlabeled Data with Arbitrary Positive Shift [11.663072799764542]
This paper shows that PU learning is possible even with arbitrarily non-representative positive data given unlabeled data. We integrate this into two statistically consistent methods to address arbitrary positive bias. Experimental results demonstrate our methods' effectiveness across numerous real-world datasets.
arXiv Detail & Related papers (2020-02-24T13:53:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.