Functional Stochastic Gradient MCMC for Bayesian Neural Networks
- URL: http://arxiv.org/abs/2409.16632v2
- Date: Thu, 10 Oct 2024 04:20:01 GMT
- Title: Functional Stochastic Gradient MCMC for Bayesian Neural Networks
- Authors: Mengjing Wu, Junyu Xuan, Jie Lu,
- Abstract summary: We introduce novel functional MCMC schemes, including gradient versions, that can incorporate more informative priors.
Our schemes demonstrate improved performance in both predictive accuracy and uncertainty on several tasks.
- Score: 15.766590837199427
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classical parameter-space Bayesian inference for Bayesian neural networks (BNNs) suffers from several unresolved prior issues, such as knowledge encoding intractability and pathological behaviours in deep networks, which can lead to improper posterior inference. To address these issues, functional Bayesian inference has recently been proposed leveraging functional priors, such as the emerging functional variational inference. In addition to variational methods, stochastic gradient Markov Chain Monte Carlo (MCMC) is another scalable and effective inference method for BNNs to asymptotically generate samples from the true posterior by simulating continuous dynamics. However, existing MCMC methods perform solely in parameter space and inherit the unresolved prior issues, while extending these dynamics to function space is a non-trivial undertaking. In this paper, we introduce novel functional MCMC schemes, including stochastic gradient versions, based on newly designed diffusion dynamics that can incorporate more informative functional priors. Moreover, we prove that the stationary measure of these functional dynamics is the target posterior over functions. Our functional MCMC schemes demonstrate improved performance in both predictive accuracy and uncertainty quantification on several tasks compared to naive parameter-space MCMC and functional variational inference.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Proximal Interacting Particle Langevin Algorithms [0.0]
We introduce Proximal Interacting Particle Langevin Algorithms (PIPLA) for inference and learning in latent variable models.
We propose several variants within the novel proximal IPLA family, tailored to the problem of estimating parameters in a non-differentiable statistical model.
Our theory and experiments together show that PIPLA family can be the de facto choice for parameter estimation problems in latent variable models for non-differentiable models.
arXiv Detail & Related papers (2024-06-20T13:16:41Z) - Active Learning with Fully Bayesian Neural Networks for Discontinuous and Nonstationary Data [0.0]
We introduce fully Bayesian Neural Networks (FBNNs) for active learning tasks in the'small data' regime.
FBNNs provide reliable predictive distributions, crucial for making informed decisions under uncertainty in the active learning setting.
Here, we assess the suitability and performance of FBNNs with the No-U-Turn Sampler for active learning tasks in the'small data' regime.
arXiv Detail & Related papers (2024-05-16T05:20:47Z) - Variational Sampling of Temporal Trajectories [39.22854981703244]
We introduce a mechanism to learn the distribution of trajectories by parameterizing the transition function $f$ explicitly as an element in a function space.
Our framework allows efficient synthesis of novel trajectories, while also directly providing a convenient tool for inference.
arXiv Detail & Related papers (2024-03-18T02:12:12Z) - Deep Stochastic Processes via Functional Markov Transition Operators [59.55961312230447]
We introduce a new class of Processes (SPs) constructed by stacking sequences of neural parameterised Markov transition operators in function space.
We prove that these Markov transition operators can preserve the exchangeability and consistency of SPs.
arXiv Detail & Related papers (2023-05-24T21:15:23Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Scalable Stochastic Parametric Verification with Stochastic Variational
Smoothed Model Checking [1.5293427903448025]
Smoothed model checking (smMC) aims at inferring the satisfaction function over the entire parameter space from a limited set of observations.
In this paper, we exploit recent advances in probabilistic machine learning to push this limitation forward.
We compare the performances of smMC against those of SV-smMC by looking at the scalability, the computational efficiency and the accuracy of the reconstructed satisfaction function.
arXiv Detail & Related papers (2022-05-11T10:43:23Z) - A Variational Inference Approach to Inverse Problems with Gamma
Hyperpriors [60.489902135153415]
This paper introduces a variational iterative alternating scheme for hierarchical inverse problems with gamma hyperpriors.
The proposed variational inference approach yields accurate reconstruction, provides meaningful uncertainty quantification, and is easy to implement.
arXiv Detail & Related papers (2021-11-26T06:33:29Z) - A New Representation of Successor Features for Transfer across
Dissimilar Environments [60.813074750879615]
Many real-world RL problems require transfer among environments with different dynamics.
We propose an approach based on successor features in which we model successor feature functions with Gaussian Processes.
Our theoretical analysis proves the convergence of this approach as well as the bounded error on modelling successor feature functions.
arXiv Detail & Related papers (2021-07-18T12:37:05Z) - Leveraging Global Parameters for Flow-based Neural Posterior Estimation [90.21090932619695]
Inferring the parameters of a model based on experimental observations is central to the scientific method.
A particularly challenging setting is when the model is strongly indeterminate, i.e., when distinct sets of parameters yield identical observations.
We present a method for cracking such indeterminacy by exploiting additional information conveyed by an auxiliary set of observations sharing global parameters.
arXiv Detail & Related papers (2021-02-12T12:23:13Z) - Bayesian neural networks and dimensionality reduction [4.039245878626346]
A class of model-based approaches for such problems includes latent variables in an unknown non-linear regression function.
VAEs are artificial neural networks (ANNs) that employ approximations to make computation tractable.
We deploy Markov chain Monte Carlo sampling algorithms for Bayesian inference in ANN models with latent variables.
arXiv Detail & Related papers (2020-08-18T17:11:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.