On the role of data in PAC-Bayes bounds
- URL: http://arxiv.org/abs/2006.10929v2
- Date: Tue, 27 Oct 2020 03:04:42 GMT
- Title: On the role of data in PAC-Bayes bounds
- Authors: Gintare Karolina Dziugaite, Kyle Hsu, Waseem Gharbieh, Gabriel Arpino,
Daniel M. Roy
- Abstract summary: PAC-Bayes bounds are often the Kullback--Leibler divergence between the posterior and prior.
We show that the bound based on the prior can be suboptimal.
We show that using data can mean the difference between vacuous and nonvacuous bounds.
- Score: 24.53731903804468
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The dominant term in PAC-Bayes bounds is often the Kullback--Leibler
divergence between the posterior and prior. For so-called linear PAC-Bayes risk
bounds based on the empirical risk of a fixed posterior kernel, it is possible
to minimize the expected value of the bound by choosing the prior to be the
expected posterior, which we call the oracle prior on the account that it is
distribution dependent. In this work, we show that the bound based on the
oracle prior can be suboptimal: In some cases, a stronger bound is obtained by
using a data-dependent oracle prior, i.e., a conditional expectation of the
posterior, given a subset of the training data that is then excluded from the
empirical risk term. While using data to learn a prior is a known heuristic,
its essential role in optimal bounds is new. In fact, we show that using data
can mean the difference between vacuous and nonvacuous bounds. We apply this
new principle in the setting of nonconvex learning, simulating data-dependent
oracle priors on MNIST and Fashion MNIST with and without held-out data, and
demonstrating new nonvacuous bounds in both cases.
Related papers
- Tight Lower Bounds and Improved Convergence in Performative Prediction [29.169972807928]
We extend the Repeated Risk Minimization (RRM) framework by utilizing historical datasets from previous snapshots.
We introduce a new upper bound for methods that use only the final iteration of the dataset.
We empirically observe faster convergence to the stable point on various performative prediction benchmarks.
arXiv Detail & Related papers (2024-12-04T19:06:19Z) - Unrolled denoising networks provably learn optimal Bayesian inference [54.79172096306631]
We prove the first rigorous learning guarantees for neural networks based on unrolling approximate message passing (AMP)
For compressed sensing, we prove that when trained on data drawn from a product prior, the layers of the network converge to the same denoisers used in Bayes AMP.
arXiv Detail & Related papers (2024-09-19T17:56:16Z) - Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.
We demonstrate that in this setting, the generalized cross validation estimator (GCV) fails to correctly predict the out-of-sample risk.
We further extend our analysis to the case where the test point has nontrivial correlations with the training set, a setting often encountered in time series forecasting.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Recursive PAC-Bayes: A Frequentist Approach to Sequential Prior Updates with No Information Loss [16.84312626844573]
PAC-Bayesian analysis is a frequentist framework for incorporating prior knowledge into learning.
We present a novel and, in retrospect, surprisingly simple and powerful PAC-Bayesian procedure that allows sequential prior updates with no information loss.
arXiv Detail & Related papers (2024-05-23T15:15:17Z) - PAC-Bayes-Chernoff bounds for unbounded losses [9.987130158432755]
We introduce a new PAC-Bayes oracle bound for unbounded losses that extends Cram'er-Chernoff bounds to the PAC-Bayesian setting.
Our approach naturally leverages properties of Cram'er-Chernoff bounds, such as exact optimization of the free parameter in many PAC-Bayes bounds.
arXiv Detail & Related papers (2024-01-02T10:58:54Z) - Calibrating Neural Simulation-Based Inference with Differentiable
Coverage Probability [50.44439018155837]
We propose to include a calibration term directly into the training objective of the neural model.
By introducing a relaxation of the classical formulation of calibration error we enable end-to-end backpropagation.
It is directly applicable to existing computational pipelines allowing reliable black-box posterior inference.
arXiv Detail & Related papers (2023-10-20T10:20:45Z) - Approximating Counterfactual Bounds while Fusing Observational, Biased
and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies.
We show that the likelihood of the available data has no local maxima.
We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z) - The Power and Limitation of Pretraining-Finetuning for Linear Regression
under Covariate Shift [127.21287240963859]
We investigate a transfer learning approach with pretraining on the source data and finetuning based on the target data.
For a large class of linear regression instances, transfer learning with $O(N2)$ source data is as effective as supervised learning with $N$ target data.
arXiv Detail & Related papers (2022-08-03T05:59:49Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - PAC-Bayes Analysis Beyond the Usual Bounds [16.76187007910588]
We focus on a learning model where the learner observes a finite set of training examples.
The learned data-dependent distribution is then used to make randomized predictions.
arXiv Detail & Related papers (2020-06-23T14:30:24Z) - Convex Nonparanormal Regression [8.497456090408084]
We introduce Convex Nonparanormal Regression (CNR), a conditional nonparanormal approach for estimating the posterior conditional distribution.
For the special but powerful case of a piecewise linear dictionary, we provide a closed form of the posterior mean.
We demonstrate the advantages of CNR over classical competitors using synthetic and real world data.
arXiv Detail & Related papers (2020-04-21T19:42:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.