Sequential Subspace Search for Functional Bayesian Optimization
Incorporating Experimenter Intuition
- URL: http://arxiv.org/abs/2009.03543v1
- Date: Tue, 8 Sep 2020 06:54:11 GMT
- Title: Sequential Subspace Search for Functional Bayesian Optimization
Incorporating Experimenter Intuition
- Authors: Alistair Shilton, Sunil Gupta, Santu Rana, Svetha Venkatesh
- Abstract summary: Our algorithm generates a sequence of finite-dimensional random subspaces of functional space spanned by a set of draws from the experimenter's Gaussian Process.
Standard Bayesian optimisation is applied on each subspace, and the best solution found used as a starting point (origin) for the next subspace.
We test our algorithm in simulated and real-world experiments, namely blind function matching, finding the optimal precipitation-strengthening function for an aluminium alloy, and learning rate schedule optimisation for deep networks.
- Score: 63.011641517977644
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose an algorithm for Bayesian functional optimisation - that is,
finding the function to optimise a process - guided by experimenter beliefs and
intuitions regarding the expected characteristics (length-scale, smoothness,
cyclicity etc.) of the optimal solution encoded into the covariance function of
a Gaussian Process. Our algorithm generates a sequence of finite-dimensional
random subspaces of functional space spanned by a set of draws from the
experimenter's Gaussian Process. Standard Bayesian optimisation is applied on
each subspace, and the best solution found used as a starting point (origin)
for the next subspace. Using the concept of effective dimensionality, we
analyse the convergence of our algorithm and provide a regret bound to show
that our algorithm converges in sub-linear time provided a finite effective
dimension exists. We test our algorithm in simulated and real-world
experiments, namely blind function matching, finding the optimal
precipitation-strengthening function for an aluminium alloy, and learning rate
schedule optimisation for deep networks.
Related papers
- Dynamic Anisotropic Smoothing for Noisy Derivative-Free Optimization [0.0]
We propose a novel algorithm that extends the methods of ball smoothing and Gaussian smoothing for noisy derivative-free optimization.
The algorithm dynamically adapts the shape of the smoothing kernel to approximate the Hessian of the objective function around a local optimum.
arXiv Detail & Related papers (2024-05-02T21:04:20Z) - An Adaptive Dimension Reduction Estimation Method for High-dimensional
Bayesian Optimization [6.79843988450982]
We propose a two-step optimization framework to extend BO to high-dimensional settings.
Our algorithm offers the flexibility to operate these steps either concurrently or in sequence.
Numerical experiments validate the efficacy of our method in challenging scenarios.
arXiv Detail & Related papers (2024-03-08T16:21:08Z) - Enhancing Gaussian Process Surrogates for Optimization and Posterior Approximation via Random Exploration [2.984929040246293]
novel noise-free Bayesian optimization strategies that rely on a random exploration step to enhance the accuracy of Gaussian process surrogate models.
New algorithms retain the ease of implementation of the classical GP-UCB, but an additional exploration step facilitates their convergence.
arXiv Detail & Related papers (2024-01-30T14:16:06Z) - Sample Complexity for Quadratic Bandits: Hessian Dependent Bounds and
Optimal Algorithms [64.10576998630981]
We show the first tight characterization of the optimal Hessian-dependent sample complexity.
A Hessian-independent algorithm universally achieves the optimal sample complexities for all Hessian instances.
The optimal sample complexities achieved by our algorithm remain valid for heavy-tailed noise distributions.
arXiv Detail & Related papers (2023-06-21T17:03:22Z) - Non-Convex Optimization with Certificates and Fast Rates Through Kernel
Sums of Squares [68.8204255655161]
We consider potentially non- optimized approximation problems.
In this paper, we propose an algorithm that achieves close to optimal a priori computational guarantees.
arXiv Detail & Related papers (2022-04-11T09:37:04Z) - Accelerated Algorithms for Convex and Non-Convex Optimization on
Manifolds [9.632674803757475]
We propose a scheme for solving convex and non- optimization problems on distance.
Our proposed algorithm adapts to the level of complexity in the objective function.
arXiv Detail & Related papers (2020-10-18T02:48:22Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - A Riemannian Primal-dual Algorithm Based on Proximal Operator and its
Application in Metric Learning [3.511851311025242]
We propose a primal-dual algorithm to optimize the primal and dual variables iteratively.
We prove convergence of the proposed algorithm and show its non-asymptotic convergence rate.
Preliminary experimental results on an optimal fund selection problem in fund of funds management showed its efficacy.
arXiv Detail & Related papers (2020-05-19T03:31:01Z) - Incorporating Expert Prior in Bayesian Optimisation via Space Warping [54.412024556499254]
In big search spaces the algorithm goes through several low function value regions before reaching the optimum of the function.
One approach to subside this cold start phase is to use prior knowledge that can accelerate the optimisation.
In this paper, we represent the prior knowledge about the function optimum through a prior distribution.
The prior distribution is then used to warp the search space in such a way that space gets expanded around the high probability region of function optimum and shrinks around low probability region of optimum.
arXiv Detail & Related papers (2020-03-27T06:18:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.