Recursive PAC-Bayes: A Frequentist Approach to Sequential Prior Updates with No Information Loss
- URL: http://arxiv.org/abs/2405.14681v2
- Date: Tue, 05 Nov 2024 11:34:30 GMT
- Title: Recursive PAC-Bayes: A Frequentist Approach to Sequential Prior Updates with No Information Loss
- Authors: Yi-Shan Wu, Yijie Zhang, Badr-Eddine Chérief-Abdellatif, Yevgeny Seldin,
- Abstract summary: PAC-Bayesian analysis is a frequentist framework for incorporating prior knowledge into learning.
We present a novel and, in retrospect, surprisingly simple and powerful PAC-Bayesian procedure that allows sequential prior updates with no information loss.
- Score: 16.84312626844573
- License:
- Abstract: PAC-Bayesian analysis is a frequentist framework for incorporating prior knowledge into learning. It was inspired by Bayesian learning, which allows sequential data processing and naturally turns posteriors from one processing step into priors for the next. However, despite two and a half decades of research, the ability to update priors sequentially without losing confidence information along the way remained elusive for PAC-Bayes. While PAC-Bayes allows construction of data-informed priors, the final confidence intervals depend only on the number of points that were not used for the construction of the prior, whereas confidence information in the prior, which is related to the number of points used to construct the prior, is lost. This limits the possibility and benefit of sequential prior updates, because the final bounds depend only on the size of the final batch. We present a novel and, in retrospect, surprisingly simple and powerful PAC-Bayesian procedure that allows sequential prior updates with no information loss. The procedure is based on a novel decomposition of the expected loss of randomized classifiers. The decomposition rewrites the loss of the posterior as an excess loss relative to a downscaled loss of the prior plus the downscaled loss of the prior, which is bounded recursively. As a side result, we also present a generalization of the split-kl and PAC-Bayes-split-kl inequalities to discrete random variables, which we use for bounding the excess losses, and which can be of independent interest. In empirical evaluation the new procedure significantly outperforms state-of-the-art.
Related papers
- Unrolled denoising networks provably learn optimal Bayesian inference [54.79172096306631]
We prove the first rigorous learning guarantees for neural networks based on unrolling approximate message passing (AMP)
For compressed sensing, we prove that when trained on data drawn from a product prior, the layers of the network converge to the same denoisers used in Bayes AMP.
arXiv Detail & Related papers (2024-09-19T17:56:16Z) - Enhancing Consistency and Mitigating Bias: A Data Replay Approach for
Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.
To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks.
However, it is not expected in practice considering the memory constraint or data privacy issue.
As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z) - PAC-Bayes-Chernoff bounds for unbounded losses [9.987130158432755]
We introduce a new PAC-Bayes oracle bound for unbounded losses that extends Cram'er-Chernoff bounds to the PAC-Bayesian setting.
Our approach naturally leverages properties of Cram'er-Chernoff bounds, such as exact optimization of the free parameter in many PAC-Bayes bounds.
arXiv Detail & Related papers (2024-01-02T10:58:54Z) - ZigZag: Universal Sampling-free Uncertainty Estimation Through Two-Step Inference [54.17205151960878]
We introduce a sampling-free approach that is generic and easy to deploy.
We produce reliable uncertainty estimates on par with state-of-the-art methods at a significantly lower computational cost.
arXiv Detail & Related papers (2022-11-21T13:23:09Z) - Improving the Efficiency of Off-Policy Reinforcement Learning by
Accounting for Past Decisions [20.531576904743282]
Off-policy estimation bias is corrected in a per-decision manner.
Off-policy algorithms such as Tree Backup and Retrace rely on this mechanism.
We propose a multistep operator that permits arbitrary past-dependent traces.
arXiv Detail & Related papers (2021-12-23T00:07:28Z) - Bayesian decision-making under misspecified priors with applications to
meta-learning [64.38020203019013]
Thompson sampling and other sequential decision-making algorithms are popular approaches to tackle explore/exploit trade-offs in contextual bandits.
We show that performance degrades gracefully with misspecified priors.
arXiv Detail & Related papers (2021-07-03T23:17:26Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality [131.45028999325797]
We develop a doubly robust off-policy AC (DR-Off-PAC) for discounted MDP.
DR-Off-PAC adopts a single timescale structure, in which both actor and critics are updated simultaneously with constant stepsize.
We study the finite-time convergence rate and characterize the sample complexity for DR-Off-PAC to attain an $epsilon$-accurate optimal policy.
arXiv Detail & Related papers (2021-02-23T18:56:13Z) - PAC-Bayes Analysis Beyond the Usual Bounds [16.76187007910588]
We focus on a learning model where the learner observes a finite set of training examples.
The learned data-dependent distribution is then used to make randomized predictions.
arXiv Detail & Related papers (2020-06-23T14:30:24Z) - On the role of data in PAC-Bayes bounds [24.53731903804468]
PAC-Bayes bounds are often the Kullback--Leibler divergence between the posterior and prior.
We show that the bound based on the prior can be suboptimal.
We show that using data can mean the difference between vacuous and nonvacuous bounds.
arXiv Detail & Related papers (2020-06-19T02:03:48Z) - PAC-Bayes unleashed: generalisation bounds with unbounded losses [12.078257783674923]
We present new PAC-Bayesian generalisation bounds for learning problems with unbounded loss functions.
This extends the relevance and applicability of the PAC-Bayes learning framework.
arXiv Detail & Related papers (2020-06-12T15:55:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.