Dimension-robust Function Space MCMC With Neural Network Priors
- URL: http://arxiv.org/abs/2012.10943v1
- Date: Sun, 20 Dec 2020 14:52:57 GMT
- Title: Dimension-robust Function Space MCMC With Neural Network Priors
- Authors: Torben Sell, Sumeetpal S. Singh
- Abstract summary: This paper introduces a new prior on functions spaces which scales more favourably in the dimension of the function's domain.
We show that our resulting posterior of the unknown function is amenable to sampling using Hilbert space Markov chain Monte Carlo methods.
We show that our priors are competitive and have distinct advantages over other function space priors.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces a new prior on functions spaces which scales more
favourably in the dimension of the function's domain compared to the usual
Karhunen-Lo\'eve function space prior, a property we refer to as
dimension-robustness. The proposed prior is a Bayesian neural network prior,
where each weight and bias has an independent Gaussian prior, but with the key
difference that the variances decrease in the width of the network, such that
the variances form a summable sequence and the infinite width limit neural
network is well defined. We show that our resulting posterior of the unknown
function is amenable to sampling using Hilbert space Markov chain Monte Carlo
methods. These sampling methods are favoured because they are stable under
mesh-refinement, in the sense that the acceptance probability does not shrink
to 0 as more parameters are introduced to better approximate the well-defined
infinite limit. We show that our priors are competitive and have distinct
advantages over other function space priors. Upon defining a suitable
likelihood for continuous value functions in a Bayesian approach to
reinforcement learning, our new prior is used in numerical examples to
illustrate its performance and dimension-robustness.
Related papers
- Regularized KL-Divergence for Well-Defined Function-Space Variational Inference in Bayesian neural networks [9.023847175654604]
posterior uncertainty estimates depend on the choice of prior, and finding informative priors in weight-space has proven difficult.
This has motivated variational inference (VI) methods that pose priors directly on the function generated by the BNN rather than on weights.
We show that our method incorporates the properties specified by the GP prior on synthetic and small real-world data sets.
arXiv Detail & Related papers (2024-06-06T17:57:49Z) - Joint Bayesian Inference of Graphical Structure and Parameters with a
Single Generative Flow Network [59.79008107609297]
We propose in this paper to approximate the joint posterior over the structure of a Bayesian Network.
We use a single GFlowNet whose sampling policy follows a two-phase process.
Since the parameters are included in the posterior distribution, this leaves more flexibility for the local probability models.
arXiv Detail & Related papers (2023-05-30T19:16:44Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Bayesian Interpolation with Deep Linear Networks [92.1721532941863]
Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory.
We show that linear networks make provably optimal predictions at infinite depth.
We also show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth.
arXiv Detail & Related papers (2022-12-29T20:57:46Z) - Bayesian neural network priors for edge-preserving inversion [3.2046720177804646]
A class of prior distributions based on the output of neural networks with heavy-tailed weights is introduced.
We show theoretically that samples from such priors have desirable discontinuous-like properties even when the network width is finite.
arXiv Detail & Related papers (2021-12-20T16:39:05Z) - All You Need is a Good Functional Prior for Bayesian Deep Learning [15.10662960548448]
We argue that this is a hugely limiting aspect of Bayesian deep learning.
We propose a novel and robust framework to match their prior with the functional prior of neural networks.
We provide vast experimental evidence that coupling these priors with scalable Markov chain Monte Carlo sampling offers systematically large performance improvements.
arXiv Detail & Related papers (2020-11-25T15:36:16Z) - Understanding Variational Inference in Function-Space [20.940162027560408]
We highlight some advantages and limitations of employing the Kullback-Leibler divergence in this setting.
We propose (featurized) Bayesian linear regression as a benchmark for function-space' inference methods that directly measures approximation quality.
arXiv Detail & Related papers (2020-11-18T17:42:01Z) - The Ridgelet Prior: A Covariance Function Approach to Prior
Specification for Bayesian Neural Networks [4.307812758854161]
We construct a prior distribution for the parameters of a network that approximates the posited Gaussian process in the output space of the network.
This establishes the property that a Bayesian neural network can approximate any Gaussian process whose covariance function is sufficiently regular.
arXiv Detail & Related papers (2020-10-16T16:39:45Z) - Exploring the Uncertainty Properties of Neural Networks' Implicit Priors
in the Infinite-Width Limit [47.324627920761685]
We use recent theoretical advances that characterize the function-space prior to an ensemble of infinitely-wide NNs as a Gaussian process.
This gives us a better understanding of the implicit prior NNs place on function space.
We also examine the calibration of previous approaches to classification with the NNGP.
arXiv Detail & Related papers (2020-10-14T18:41:54Z) - Slice Sampling for General Completely Random Measures [74.24975039689893]
We present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables.
The efficacy of the proposed algorithm is evaluated on several popular nonparametric models.
arXiv Detail & Related papers (2020-06-24T17:53:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.