Approximately Bayes-Optimal Pseudo Label Selection
- URL: http://arxiv.org/abs/2302.08883v5
- Date: Mon, 26 Jun 2023 11:27:39 GMT
- Title: Approximately Bayes-Optimal Pseudo Label Selection
- Authors: Julian Rodemann, Jann Goschenhofer, Emilio Dorigatti, Thomas Nagler,
Thomas Augustin
- Abstract summary: Semi-supervised learning by self-training heavily relies on pseudo-label selection (PLS)
Early overfitting might thus be propagated to the final model by selecting instances with overconfident but erroneous predictions.
This paper introduces BPLS, a Bayesian framework for PLS that aims to mitigate this issue.
- Score: 0.5249805590164901
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semi-supervised learning by self-training heavily relies on pseudo-label
selection (PLS). The selection often depends on the initial model fit on
labeled data. Early overfitting might thus be propagated to the final model by
selecting instances with overconfident but erroneous predictions, often
referred to as confirmation bias. This paper introduces BPLS, a Bayesian
framework for PLS that aims to mitigate this issue. At its core lies a
criterion for selecting instances to label: an analytical approximation of the
posterior predictive of pseudo-samples. We derive this selection criterion by
proving Bayes optimality of the posterior predictive of pseudo-samples. We
further overcome computational hurdles by approximating the criterion
analytically. Its relation to the marginal likelihood allows us to come up with
an approximation based on Laplace's method and the Gaussian integral. We
empirically assess BPLS for parametric generalized linear and non-parametric
generalized additive models on simulated and real-world data. When faced with
high-dimensional data prone to overfitting, BPLS outperforms traditional PLS
methods.
Related papers
- Predictive variational inference: Learn the predictively optimal posterior distribution [1.7648680700685022]
Vanilla variational inference finds an optimal approximation to the Bayesian posterior distribution, but even the exact Bayesian posterior is often not meaningful under model misspecification.
We propose predictive variational inference (PVI): a general inference framework that seeks and samples from an optimal posterior density.
This framework applies to both likelihood-exact and likelihood-free models.
arXiv Detail & Related papers (2024-10-18T19:44:57Z) - Fusion of Gaussian Processes Predictions with Monte Carlo Sampling [61.31380086717422]
In science and engineering, we often work with models designed for accurate prediction of variables of interest.
Recognizing that these models are approximations of reality, it becomes desirable to apply multiple models to the same data and integrate their outcomes.
arXiv Detail & Related papers (2024-03-03T04:21:21Z) - Revisiting the Dataset Bias Problem from a Statistical Perspective [72.94990819287551]
We study the "dataset bias" problem from a statistical standpoint.
We identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b.
We propose to mitigate dataset bias via either weighting the objective of each sample n by frac1p(u_n|b_n) or sampling that sample with a weight proportional to frac1p(u_n|b_n).
arXiv Detail & Related papers (2024-02-05T22:58:06Z) - Pseudo Label Selection is a Decision Problem [0.0]
Pseudo-Labeling is a simple and effective approach to semi-supervised learning.
It requires criteria that guide the selection of pseudo-labeled data.
Overfitting can be propagated to the final model by choosing instances with overconfident but wrong predictions.
arXiv Detail & Related papers (2023-09-25T07:48:02Z) - Correcting Model Bias with Sparse Implicit Processes [0.9187159782788579]
We show that Sparse Implicit Processes (SIP) is capable of correcting model bias when the data generating mechanism differs strongly from the one implied by the model.
We use synthetic datasets to show that SIP is capable of providing predictive distributions that reflect the data better than the exact predictions of the initial, but wrongly assumed model.
arXiv Detail & Related papers (2022-07-21T18:00:01Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Maximum sampled conditional likelihood for informative subsampling [4.708378681950648]
Subsampling is a computationally effective approach to extract information from massive data sets when computing resources are limited.
We propose to use the maximum maximum conditional likelihood estimator (MSCLE) based on the sampled data.
arXiv Detail & Related papers (2020-11-11T16:01:17Z) - On the Convergence Rate of Projected Gradient Descent for a
Back-Projection based Objective [58.33065918353532]
We consider a back-projection based fidelity term as an alternative to the common least squares (LS)
We show that using the BP term, rather than the LS term, requires fewer iterations of optimization algorithms.
arXiv Detail & Related papers (2020-05-03T00:58:23Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z) - SUMO: Unbiased Estimation of Log Marginal Probability for Latent
Variable Models [80.22609163316459]
We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series.
We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost.
arXiv Detail & Related papers (2020-04-01T11:49:30Z) - On Low-rank Trace Regression under General Sampling Distribution [9.699586426043885]
We show that cross-validated estimators satisfy near-optimal error bounds on general assumptions.
We also show that the cross-validated estimator outperforms the theory-inspired approach of selecting the parameter.
arXiv Detail & Related papers (2019-04-18T02:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.