Revisiting the Equivalence of Bayesian Neural Networks and Gaussian Processes: On the Importance of Learning Activations
- URL: http://arxiv.org/abs/2410.15777v2
- Date: Mon, 17 Feb 2025 17:11:46 GMT
- Title: Revisiting the Equivalence of Bayesian Neural Networks and Gaussian Processes: On the Importance of Learning Activations
- Authors: Marcin Sendera, Amin Sorkhei, Tomasz Kuśmierczyk,
- Abstract summary: We show that trainable activations are crucial for effective mapping of GP priors to wide BNNs.
We also introduce trainable periodic activations that ensure global stationarity by design.
- Score: 1.0468715529145969
- License:
- Abstract: Gaussian Processes (GPs) provide a convenient framework for specifying function-space priors, making them a natural choice for modeling uncertainty. In contrast, Bayesian Neural Networks (BNNs) offer greater scalability and extendability but lack the advantageous properties of GPs. This motivates the development of BNNs capable of replicating GP-like behavior. However, existing solutions are either limited to specific GP kernels or rely on heuristics. We demonstrate that trainable activations are crucial for effective mapping of GP priors to wide BNNs. Specifically, we leverage the closed-form 2-Wasserstein distance for efficient gradient-based optimization of reparameterized priors and activations. Beyond learned activations, we also introduce trainable periodic activations that ensure global stationarity by design, and functional priors conditioned on GP hyperparameters to allow efficient model selection. Empirically, our method consistently outperforms existing approaches or matches performance of the heuristic methods, while offering stronger theoretical foundations.
Related papers
- Bridge the Inference Gaps of Neural Processes via Expectation Maximization [27.92039393053804]
The neural process (NP) is a family of computationally efficient models for learning distributions over functions.
We propose a surrogate objective of the target log-likelihood of the meta dataset within the expectation framework.
The resulting model, referred to as the Self-normalized weighted Neural Process (SI-NP), can learn a more accurate functional prior.
arXiv Detail & Related papers (2025-01-04T03:28:21Z) - Bayesian Optimization via Continual Variational Last Layer Training [16.095427911235646]
We build on variational Bayesian last layers (VBLLs) to connect training of these models to exact conditioning in GPs.
We exploit this connection to develop an efficient online training algorithm that interleaves conditioning and optimization.
Our findings suggest that VBLL networks significantly outperform GPs and other BNN architectures on tasks with complex input correlations.
arXiv Detail & Related papers (2024-12-12T17:21:50Z) - Achieving Constraints in Neural Networks: A Stochastic Augmented
Lagrangian Approach [49.1574468325115]
Regularizing Deep Neural Networks (DNNs) is essential for improving generalizability and preventing overfitting.
We propose a novel approach to DNN regularization by framing the training process as a constrained optimization problem.
We employ the Augmented Lagrangian (SAL) method to achieve a more flexible and efficient regularization mechanism.
arXiv Detail & Related papers (2023-10-25T13:55:35Z) - Training-Free Neural Active Learning with Initialization-Robustness
Guarantees [27.38525683635627]
We introduce our expected variance with Gaussian processes (EV-GP) criterion for neural active learning.
Our EV-GP criterion is training-free, i.e., it does not require any training of the NN during data selection.
arXiv Detail & Related papers (2023-06-07T14:28:42Z) - Linear Time GPs for Inferring Latent Trajectories from Neural Spike
Trains [7.936841911281107]
We propose cvHM, a general inference framework for latent GP models leveraging Hida-Mat'ern kernels and conjugate variational inference (CVI)
We are able to perform variational inference of latent neural trajectories with linear time complexity for arbitrary likelihoods.
arXiv Detail & Related papers (2023-06-01T16:31:36Z) - A Study of Bayesian Neural Network Surrogates for Bayesian Optimization [46.97686790714025]
Bayesian neural networks (BNNs) have recently become practical function approximators.
We study BNNs as alternatives to standard GP surrogates for optimization.
arXiv Detail & Related papers (2023-05-31T17:00:00Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Sample-Then-Optimize Batch Neural Thompson Sampling [50.800944138278474]
We introduce two algorithms for black-box optimization based on the Thompson sampling (TS) policy.
To choose an input query, we only need to train an NN and then choose the query by maximizing the trained NN.
Our algorithms sidestep the need to invert the large parameter matrix yet still preserve the validity of the TS policy.
arXiv Detail & Related papers (2022-10-13T09:01:58Z) - Surrogate modeling for Bayesian optimization beyond a single Gaussian
process [62.294228304646516]
We propose a novel Bayesian surrogate model to balance exploration with exploitation of the search space.
To endow function sampling with scalability, random feature-based kernel approximation is leveraged per GP model.
To further establish convergence of the proposed EGP-TS to the global optimum, analysis is conducted based on the notion of Bayesian regret.
arXiv Detail & Related papers (2022-05-27T16:43:10Z) - Incremental Ensemble Gaussian Processes [53.3291389385672]
We propose an incremental ensemble (IE-) GP framework, where an EGP meta-learner employs an it ensemble of GP learners, each having a unique kernel belonging to a prescribed kernel dictionary.
With each GP expert leveraging the random feature-based approximation to perform online prediction and model update with it scalability, the EGP meta-learner capitalizes on data-adaptive weights to synthesize the per-expert predictions.
The novel IE-GP is generalized to accommodate time-varying functions by modeling structured dynamics at the EGP meta-learner and within each GP learner.
arXiv Detail & Related papers (2021-10-13T15:11:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.