Related papers: On the role of data in PAC-Bayes bounds

On the role of data in PAC-Bayes bounds

URL: http://arxiv.org/abs/2006.10929v2
Date: Tue, 27 Oct 2020 03:04:42 GMT
Title: On the role of data in PAC-Bayes bounds
Authors: Gintare Karolina Dziugaite, Kyle Hsu, Waseem Gharbieh, Gabriel Arpino, Daniel M. Roy
Abstract summary: PAC-Bayes bounds are often the Kullback--Leibler divergence between the posterior and prior. We show that the bound based on the prior can be suboptimal. We show that using data can mean the difference between vacuous and nonvacuous bounds.
Score: 24.53731903804468
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The dominant term in PAC-Bayes bounds is often the Kullback--Leibler divergence between the posterior and prior. For so-called linear PAC-Bayes risk bounds based on the empirical risk of a fixed posterior kernel, it is possible to minimize the expected value of the bound by choosing the prior to be the expected posterior, which we call the oracle prior on the account that it is distribution dependent. In this work, we show that the bound based on the oracle prior can be suboptimal: In some cases, a stronger bound is obtained by using a data-dependent oracle prior, i.e., a conditional expectation of the posterior, given a subset of the training data that is then excluded from the empirical risk term. While using data to learn a prior is a known heuristic, its essential role in optimal bounds is new. In fact, we show that using data can mean the difference between vacuous and nonvacuous bounds. We apply this new principle in the setting of nonconvex learning, simulating data-dependent oracle priors on MNIST and Fashion MNIST with and without held-out data, and demonstrating new nonvacuous bounds in both cases.

Related papers

Tight Lower Bounds and Improved Convergence in Performative Prediction [29.169972807928]
We extend the Repeated Risk Minimization (RRM) framework by utilizing historical datasets from previous snapshots. We introduce a new upper bound for methods that use only the final iteration of the dataset. We empirically observe faster convergence to the stable point on various performative prediction benchmarks.
arXiv Detail & Related papers (2024-12-04T19:06:19Z)
Unrolled denoising networks provably learn optimal Bayesian inference [54.79172096306631]
We prove the first rigorous learning guarantees for neural networks based on unrolling approximate message passing (AMP) For compressed sensing, we prove that when trained on data drawn from a product prior, the layers of the network converge to the same denoisers used in Bayes AMP.
arXiv Detail & Related papers (2024-09-19T17:56:16Z)
Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations. We further extend our analysis to the case where the test point has non-trivial correlations with the training set, setting often encountered in time series forecasting. We validate our theory across a variety of high dimensional data.
arXiv Detail & Related papers (2024-08-08T17:27:29Z)
Recursive PAC-Bayes: A Frequentist Approach to Sequential Prior Updates with No Information Loss [16.84312626844573]
PAC-Bayesian analysis is a frequentist framework for incorporating prior knowledge into learning. We present a novel and, in retrospect, surprisingly simple and powerful PAC-Bayesian procedure that allows sequential prior updates with no information loss.
arXiv Detail & Related papers (2024-05-23T15:15:17Z)
PAC-Bayes-Chernoff bounds for unbounded losses [9.987130158432755]
We introduce a new PAC-Bayes oracle bound for unbounded losses that extends Cram'er-Chernoff bounds to the PAC-Bayesian setting. Our approach naturally leverages properties of Cram'er-Chernoff bounds, such as exact optimization of the free parameter in many PAC-Bayes bounds.
arXiv Detail & Related papers (2024-01-02T10:58:54Z)
Calibrating Neural Simulation-Based Inference with Differentiable Coverage Probability [50.44439018155837]
We propose to include a calibration term directly into the training objective of the neural model. By introducing a relaxation of the classical formulation of calibration error we enable end-to-end backpropagation. It is directly applicable to existing computational pipelines allowing reliable black-box posterior inference.
arXiv Detail & Related papers (2023-10-20T10:20:45Z)
Approximating Counterfactual Bounds while Fusing Observational, Biased and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies. We show that the likelihood of the available data has no local maxima. We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z)
The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift [127.21287240963859]
We investigate a transfer learning approach with pretraining on the source data and finetuning based on the target data. For a large class of linear regression instances, transfer learning with $O(N2)$ source data is as effective as supervised learning with $N$ target data.
arXiv Detail & Related papers (2022-08-03T05:59:49Z)
Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data. We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class. For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z)
A new framework for experimental design using Bayesian Evidential Learning: the case of wellhead protection area [0.0]
We predict the wellhead protection area (WHPA), the shape and extent of which is influenced by the distribution of hydraulic conductivity (K), from a small number of tracing experiments (predictors) Our first objective is to make predictions of the WHPA within the Bayesian Evidential Learning framework, which aims to find a direct relationship between predictor and target using machine learning. Our second objective is to extend BEL to identify the optimal design of data source locations that minimizes the posterior uncertainty of the WHPA.
arXiv Detail & Related papers (2021-05-12T09:40:28Z)
PAC-Bayes Analysis Beyond the Usual Bounds [16.76187007910588]
We focus on a learning model where the learner observes a finite set of training examples. The learned data-dependent distribution is then used to make randomized predictions.
arXiv Detail & Related papers (2020-06-23T14:30:24Z)
Convex Nonparanormal Regression [8.497456090408084]
We introduce Convex Nonparanormal Regression (CNR), a conditional nonparanormal approach for estimating the posterior conditional distribution. For the special but powerful case of a piecewise linear dictionary, we provide a closed form of the posterior mean. We demonstrate the advantages of CNR over classical competitors using synthetic and real world data.
arXiv Detail & Related papers (2020-04-21T19:42:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.