On the role of data in PAC-Bayes bounds
- URL: http://arxiv.org/abs/2006.10929v2
- Date: Tue, 27 Oct 2020 03:04:42 GMT
- Title: On the role of data in PAC-Bayes bounds
- Authors: Gintare Karolina Dziugaite, Kyle Hsu, Waseem Gharbieh, Gabriel Arpino,
Daniel M. Roy
- Abstract summary: PAC-Bayes bounds are often the Kullback--Leibler divergence between the posterior and prior.
We show that the bound based on the prior can be suboptimal.
We show that using data can mean the difference between vacuous and nonvacuous bounds.
- Score: 24.53731903804468
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The dominant term in PAC-Bayes bounds is often the Kullback--Leibler
divergence between the posterior and prior. For so-called linear PAC-Bayes risk
bounds based on the empirical risk of a fixed posterior kernel, it is possible
to minimize the expected value of the bound by choosing the prior to be the
expected posterior, which we call the oracle prior on the account that it is
distribution dependent. In this work, we show that the bound based on the
oracle prior can be suboptimal: In some cases, a stronger bound is obtained by
using a data-dependent oracle prior, i.e., a conditional expectation of the
posterior, given a subset of the training data that is then excluded from the
empirical risk term. While using data to learn a prior is a known heuristic,
its essential role in optimal bounds is new. In fact, we show that using data
can mean the difference between vacuous and nonvacuous bounds. We apply this
new principle in the setting of nonconvex learning, simulating data-dependent
oracle priors on MNIST and Fashion MNIST with and without held-out data, and
demonstrating new nonvacuous bounds in both cases.
Related papers
- Recursive PAC-Bayes: A Frequentist Approach to Sequential Prior Updates with No Information Loss [16.84312626844573]
PAC-Bayesian analysis is a frequentist framework for incorporating prior knowledge into learning.
We present a novel and, in retrospect, surprisingly simple and powerful PAC-Bayesian procedure that allows sequential prior updates with no information loss.
arXiv Detail & Related papers (2024-05-23T15:15:17Z) - PAC-Bayes-Chernoff bounds for unbounded losses [1.9799527196428246]
We introduce a new PAC-Bayes oracle bound for unbounded losses.
This result can be understood as a PAC-Bayesian version of the Cram'er-Chernoff bound.
We show that our result naturally allows exact optimization of the free parameter on many PAC-Bayes bounds.
arXiv Detail & Related papers (2024-01-02T10:58:54Z) - Calibrating Neural Simulation-Based Inference with Differentiable
Coverage Probability [50.44439018155837]
We propose to include a calibration term directly into the training objective of the neural model.
By introducing a relaxation of the classical formulation of calibration error we enable end-to-end backpropagation.
It is directly applicable to existing computational pipelines allowing reliable black-box posterior inference.
arXiv Detail & Related papers (2023-10-20T10:20:45Z) - Approximating Counterfactual Bounds while Fusing Observational, Biased
and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies.
We show that the likelihood of the available data has no local maxima.
We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z) - The Power and Limitation of Pretraining-Finetuning for Linear Regression
under Covariate Shift [127.21287240963859]
We investigate a transfer learning approach with pretraining on the source data and finetuning based on the target data.
For a large class of linear regression instances, transfer learning with $O(N2)$ source data is as effective as supervised learning with $N$ target data.
arXiv Detail & Related papers (2022-08-03T05:59:49Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - A new framework for experimental design using Bayesian Evidential
Learning: the case of wellhead protection area [0.0]
We predict the wellhead protection area (WHPA), the shape and extent of which is influenced by the distribution of hydraulic conductivity (K), from a small number of tracing experiments (predictors)
Our first objective is to make predictions of the WHPA within the Bayesian Evidential Learning framework, which aims to find a direct relationship between predictor and target using machine learning.
Our second objective is to extend BEL to identify the optimal design of data source locations that minimizes the posterior uncertainty of the WHPA.
arXiv Detail & Related papers (2021-05-12T09:40:28Z) - DeVLBert: Learning Deconfounded Visio-Linguistic Representations [111.93480424791613]
We investigate the problem of out-of-domain visio-linguistic pretraining.
Existing methods for this problem are purely likelihood-based.
We propose a Decon-Linguistic Bert framework, abbreviated as DeVLBert, to perform intervention-based learning.
arXiv Detail & Related papers (2020-08-16T11:09:22Z) - PAC-Bayes Analysis Beyond the Usual Bounds [16.76187007910588]
We focus on a learning model where the learner observes a finite set of training examples.
The learned data-dependent distribution is then used to make randomized predictions.
arXiv Detail & Related papers (2020-06-23T14:30:24Z) - Convex Nonparanormal Regression [8.497456090408084]
We introduce Convex Nonparanormal Regression (CNR), a conditional nonparanormal approach for estimating the posterior conditional distribution.
For the special but powerful case of a piecewise linear dictionary, we provide a closed form of the posterior mean.
We demonstrate the advantages of CNR over classical competitors using synthetic and real world data.
arXiv Detail & Related papers (2020-04-21T19:42:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.