Related papers: Augmented prediction of a true class for Positive Unlabeled data under selection bias

Augmented prediction of a true class for Positive Unlabeled data under selection bias

URL: http://arxiv.org/abs/2407.10309v1
Date: Sun, 14 Jul 2024 19:58:01 GMT
Title: Augmented prediction of a true class for Positive Unlabeled data under selection bias
Authors: Jan Mielniczuk, Adam Wawrzeńczyk,
Abstract summary: We introduce a new observational setting for Positive Unlabeled (PU) data where the observations at prediction time are also labeled. We argue that the additional information is important for prediction, and call this task "augmented PU prediction" We introduce several variants of the empirical Bayes rule in such scenario and investigate their performance.
Score: 0.8594140167290099
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We introduce a new observational setting for Positive Unlabeled (PU) data where the observations at prediction time are also labeled. This occurs commonly in practice -- we argue that the additional information is important for prediction, and call this task "augmented PU prediction". We allow for labeling to be feature dependent. In such scenario, Bayes classifier and its risk is established and compared with a risk of a classifier which for unlabeled data is based only on predictors. We introduce several variants of the empirical Bayes rule in such scenario and investigate their performance. We emphasise dangers (and ease) of applying classical classification rule in the augmented PU scenario -- due to no preexisting studies, an unaware researcher is prone to skewing the obtained predictions. We conclude that the variant based on recently proposed variational autoencoder designed for PU scenario works on par or better than other considered variants and yields advantage over feature-only based methods in terms of accuracy for unlabeled samples.

Related papers

Trustworthy Prediction with Gaussian Process Knowledge Scores [7.090362431002478]
Probabilistic models are often used to make predictions in regions of the data space where no observations are available.<n>We propose a knowledge score for predictions that quantifies the extent to which observing data have reduced our uncertainty about a prediction.<n>We demonstrate in several experiments that the knowledge score can anticipate when predictions from a GPR model are accurate.
arXiv Detail & Related papers (2025-06-23T13:36:06Z)
Conformal Prediction Sets with Improved Conditional Coverage using Trust Scores [52.92618442300405]
It is impossible to achieve exact, distribution-free conditional coverage in finite samples. We propose an alternative conformal prediction algorithm that targets coverage where it matters most.
arXiv Detail & Related papers (2025-01-17T12:01:56Z)
Verifying the Selected Completely at Random Assumption in Positive-Unlabeled Learning [0.7646713951724013]
We propose a relatively simple and computationally fast test that can be used to determine whether the observed data meet the SCAR assumption. Our test is based on generating artificial labels conforming to the SCAR case, which in turn allows to mimic the distribution of the test statistic under the null hypothesis of SCAR.
arXiv Detail & Related papers (2024-03-29T20:36:58Z)
Partial-Label Learning with a Reject Option [3.1201323892302444]
We propose a novel partial-label learning algorithm with a reject option, that is, the algorithm can reject unsure predictions. Our method provides the best trade-off between the number and accuracy of non-rejected predictions when compared to our competitors.
arXiv Detail & Related papers (2024-02-01T13:41:44Z)
Conformal Prediction for Deep Classifier via Label Ranking [29.784336674173616]
Conformal prediction is a statistical framework that generates prediction sets with a desired coverage guarantee. We propose a novel algorithm named $textitSorted Adaptive Prediction Sets$ (SAPS) SAPS discards all the probability values except for the maximum softmax probability.
arXiv Detail & Related papers (2023-10-10T08:54:14Z)
Calibrated Selective Classification [34.08454890436067]
We develop a new approach to selective classification in which we propose a method for rejecting examples with "uncertain" uncertainties. We present a framework for learning selectively calibrated models, where a separate selector network is trained to improve the selective calibration error of a given base model. We demonstrate the empirical effectiveness of our approach on multiple image classification and lung cancer risk assessment tasks.
arXiv Detail & Related papers (2022-08-25T13:31:09Z)
Selective Prediction via Training Dynamics [31.708701583736644]
We show that state-of-the-art selective prediction performance can be attained solely from studying the training dynamics of a model.<n>In particular, we reject data points exhibiting too much disagreement with the final prediction at late stages in training.<n>The proposed rejection mechanism is domain-agnostic (i.e., it works for both discrete and real-valued prediction) and can be flexibly combined with existing selective prediction approaches.
arXiv Detail & Related papers (2022-05-26T17:51:29Z)
Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next. In such settings, there is a distinct type of distribution shift between the training and test data. We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z)
Taming Overconfident Prediction on Unlabeled Data from Hindsight [50.9088560433925]
Minimizing prediction uncertainty on unlabeled data is a key factor to achieve good performance in semi-supervised learning. This paper proposes a dual mechanism, named ADaptive Sharpening (ADS), which first applies a soft-threshold to adaptively mask out determinate and negligible predictions. ADS significantly improves the state-of-the-art SSL methods by making it a plug-in.
arXiv Detail & Related papers (2021-12-15T15:17:02Z)
Positive-Unlabeled Classification under Class-Prior Shift: A Prior-invariant Approach Based on Density Ratio Estimation [85.75352990739154]
We propose a novel PU classification method based on density ratio estimation. A notable advantage of our proposed method is that it does not require the class-priors in the training phase.
arXiv Detail & Related papers (2021-07-11T13:36:53Z)
Distribution-free uncertainty quantification for classification under label shift [105.27463615756733]
We focus on uncertainty quantification (UQ) for classification problems via two avenues. We first argue that label shift hurts UQ, by showing degradation in coverage and calibration. We examine these techniques theoretically in a distribution-free framework and demonstrate their excellent practical performance.
arXiv Detail & Related papers (2021-03-04T20:51:03Z)
Robust Validation: Confident Predictions Even When Distributions Shift [19.327409270934474]
We describe procedures for robust predictive inference, where a model provides uncertainty estimates on its predictions rather than point predictions. We present a method that produces prediction sets (almost exactly) giving the right coverage level for any test distribution in an $f$-divergence ball around the training population. An essential component of our methodology is to estimate the amount of expected future data shift and build robustness to it.
arXiv Detail & Related papers (2020-08-10T17:09:16Z)
Balance-Subsampled Stable Prediction [55.13512328954456]
We propose a novel balance-subsampled stable prediction (BSSP) algorithm based on the theory of fractional factorial design. A design-theoretic analysis shows that the proposed method can reduce the confounding effects among predictors induced by the distribution shift. Numerical experiments on both synthetic and real-world data sets demonstrate that our BSSP algorithm significantly outperforms the baseline methods for stable prediction across unknown test data.
arXiv Detail & Related papers (2020-06-08T07:01:38Z)
Ambiguity in Sequential Data: Predicting Uncertain Futures with Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data. We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.