Related papers: A Study of Bayesian Neural Network Surrogates for Bayesian Optimization

A Study of Bayesian Neural Network Surrogates for Bayesian Optimization

URL: http://arxiv.org/abs/2305.20028v2
Date: Wed, 8 May 2024 10:30:22 GMT
Title: A Study of Bayesian Neural Network Surrogates for Bayesian Optimization
Authors: Yucen Lily Li, Tim G. J. Rudner, Andrew Gordon Wilson,
Abstract summary: Bayesian neural networks (BNNs) have recently become practical function approximators. We study BNNs as alternatives to standard GP surrogates for optimization.
Score: 46.97686790714025
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Bayesian optimization is a highly efficient approach to optimizing objective functions which are expensive to query. These objectives are typically represented by Gaussian process (GP) surrogate models which are easy to optimize and support exact inference. While standard GP surrogates have been well-established in Bayesian optimization, Bayesian neural networks (BNNs) have recently become practical function approximators, with many benefits over standard GPs such as the ability to naturally handle non-stationarity and learn representations for high-dimensional data. In this paper, we study BNNs as alternatives to standard GP surrogates for optimization. We consider a variety of approximate inference procedures for finite-width BNNs, including high-quality Hamiltonian Monte Carlo, low-cost stochastic MCMC, and heuristics such as deep ensembles. We also consider infinite-width BNNs, linearized Laplace approximations, and partially stochastic models such as deep kernel learning. We evaluate this collection of surrogate models on diverse problems with varying dimensionality, number of objectives, non-stationarity, and discrete and continuous inputs. We find: (i) the ranking of methods is highly problem dependent, suggesting the need for tailored inductive biases; (ii) HMC is the most successful approximate inference procedure for fully stochastic BNNs; (iii) full stochasticity may be unnecessary as deep kernel learning is relatively competitive; (iv) deep ensembles perform relatively poorly; (v) infinite-width BNNs are particularly promising, especially in high dimensions.

Related papers

Bayesian Optimization via Continual Variational Last Layer Training [16.095427911235646]
We build on variational Bayesian last layers (VBLLs) to connect training of these models to exact conditioning in GPs. We exploit this connection to develop an efficient online training algorithm that interleaves conditioning and optimization. Our findings suggest that VBLL networks significantly outperform GPs and other BNN architectures on tasks with complex input correlations.
arXiv Detail & Related papers (2024-12-12T17:21:50Z)
Revisiting the Equivalence of Bayesian Neural Networks and Gaussian Processes: On the Importance of Learning Activations [1.0468715529145969]
We show that trainable activations are crucial for effective mapping of GP priors to wide BNNs. We also introduce trainable periodic activations that ensure global stationarity by design.
arXiv Detail & Related papers (2024-10-21T08:42:10Z)
A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning. These problems are often formalized as Bi-Level optimizations (BLO) We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z)
Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks. We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z)
Bayesian Kernelized Tensor Factorization as Surrogate for Bayesian Optimization [13.896697187967545]
Kernel optimization (BO) primarily uses Gaussian processes (GP) as the key surrogate model. In this paper, we propose to use Bayesian Factorization (BKTF) as a new surrogate model -- for BO in a $D$-dimensional product space. BKTF offers a flexible and highly effective approach for characterizing complex functions with uncertainty quantification.
arXiv Detail & Related papers (2023-02-28T12:00:21Z)
Sample-Then-Optimize Batch Neural Thompson Sampling [50.800944138278474]
We introduce two algorithms for black-box optimization based on the Thompson sampling (TS) policy. To choose an input query, we only need to train an NN and then choose the query by maximizing the trained NN. Our algorithms sidestep the need to invert the large parameter matrix yet still preserve the validity of the TS policy.
arXiv Detail & Related papers (2022-10-13T09:01:58Z)
Efficient Bayes Inference in Neural Networks through Adaptive Importance Sampling [19.518237361775533]
In BNNs, a complete posterior distribution of the unknown weight and bias parameters of the network is produced during the training stage. This feature is useful in countless machine learning applications. It is particularly appealing in areas where decision-making has a crucial impact, such as medical healthcare or autonomous driving.
arXiv Detail & Related papers (2022-10-03T14:59:23Z)
Comparative Analysis of Interval Reachability for Robust Implicit and Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs) INNs are a class of implicit learning models that use implicit equations as layers. We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z)
Approximate Bayesian Optimisation for Neural Networks [6.921210544516486]
A body of work has been done to automate machine learning algorithm to highlight the importance of model choice. The necessity to solve the analytical tractability and the computational feasibility in a idealistic fashion enables to ensure the efficiency and the applicability.
arXiv Detail & Related papers (2021-08-27T19:03:32Z)
Consistent Sparse Deep Learning: Theory and Computation [11.24471623055182]
We propose a frequentist-like method for learning sparse deep learning networks (DNNs) The proposed method can perform very well for large-scale network compression and high-dimensional nonlinear variable selection.
arXiv Detail & Related papers (2021-02-25T23:31:24Z)
Offline Model-Based Optimization via Normalized Maximum Likelihood Estimation [101.22379613810881]
We consider data-driven optimization problems where one must maximize a function given only queries at a fixed set of points. This problem setting emerges in many domains where function evaluation is a complex and expensive process. We propose a tractable approximation that allows us to scale our method to high-capacity neural network models.
arXiv Detail & Related papers (2021-02-16T06:04:27Z)
Train Like a (Var)Pro: Efficient Training of Neural Networks with Variable Projection [2.7561479348365734]
Deep neural networks (DNNs) have achieved state-of-theart performance across a variety of traditional machine learning tasks. In this paper, we consider training of DNNs, which arises in many state-of-the-art applications.
arXiv Detail & Related papers (2020-07-26T16:29:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.