What do you Mean? The Role of the Mean Function in Bayesian Optimisation
- URL: http://arxiv.org/abs/2004.08349v2
- Date: Fri, 8 May 2020 11:54:38 GMT
- Title: What do you Mean? The Role of the Mean Function in Bayesian Optimisation
- Authors: George De Ath and Jonathan E. Fieldsend and Richard M. Everson
- Abstract summary: We show that the rate of convergence can depend sensitively on the choice of mean function.
We find that for design dimensions $ge5$ using a constant mean function equal to the worst observed quality value is consistently the best choice.
- Score: 0.03305438525880806
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bayesian optimisation is a popular approach for optimising expensive
black-box functions. The next location to be evaluated is selected via
maximising an acquisition function that balances exploitation and exploration.
Gaussian processes, the surrogate models of choice in Bayesian optimisation,
are often used with a constant prior mean function equal to the arithmetic mean
of the observed function values. We show that the rate of convergence can
depend sensitively on the choice of mean function. We empirically investigate 8
mean functions (constant functions equal to the arithmetic mean, minimum,
median and maximum of the observed function evaluations, linear, quadratic
polynomials, random forests and RBF networks), using 10 synthetic test problems
and two real-world problems, and using the Expected Improvement and Upper
Confidence Bound acquisition functions. We find that for design dimensions
$\ge5$ using a constant mean function equal to the worst observed quality value
is consistently the best choice on the synthetic problems considered. We argue
that this worst-observed-quality function promotes exploitation leading to more
rapid convergence. However, for the real-world tasks the more complex mean
functions capable of modelling the fitness landscape may be effective, although
there is no clearly optimum choice.
Related papers
- Generalizing Bayesian Optimization with Decision-theoretic Entropies [102.82152945324381]
We consider a generalization of Shannon entropy from work in statistical decision theory.
We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures.
We then show how alternative choices for the loss yield a flexible family of acquisition functions.
arXiv Detail & Related papers (2022-10-04T04:43:58Z) - On the development of a Bayesian optimisation framework for complex
unknown systems [11.066706766632578]
This paper studies and compares common Bayesian optimisation algorithms empirically on a range of synthetic test functions.
It investigates the choice of acquisition function and number of training samples, exact calculation of acquisition functions and Monte Carlo based approaches.
arXiv Detail & Related papers (2022-07-19T09:50:34Z) - Rectified Max-Value Entropy Search for Bayesian Optimization [54.26984662139516]
We develop a rectified MES acquisition function based on the notion of mutual information.
As a result, RMES shows a consistent improvement over MES in several synthetic function benchmarks and real-world optimization problems.
arXiv Detail & Related papers (2022-02-28T08:11:02Z) - Asymptotic convergence rates for averaging strategies [10.639022684335293]
Parallel quadratic black box optimization consists in estimating the optimum of a function using $lambda$ parallel evaluations of $f$.
Averaging the $mu$ best individuals among the $lambda$ evaluations is known to provide better estimates of the optimum of a function than just picking up the best.
arXiv Detail & Related papers (2021-08-10T14:09:46Z) - Bayesian Optimization for Min Max Optimization [77.60508571062958]
We propose algorithms that perform Min Max Optimization in a setting where the function that should be optimized is not known a priori.
We extend the two acquisition functions Expected Improvement and Gaussian Process Upper Confidence Bound.
We show that these acquisition functions allow for better solutions - converging faster to the optimum than the benchmark settings.
arXiv Detail & Related papers (2021-07-29T06:49:34Z) - Regret Bounds for Gaussian-Process Optimization in Large Domains [40.92207267407271]
We provide upper bounds on the suboptimality (Bayesian simple regret) of the solution found by optimization strategies.
These regret bounds illuminate the relationship between the number of evaluations, the domain size, and the optimality of the retrieved function value.
In particular, they show that even when the number of evaluations is far too small to find the global optimum, we can find nontrivial function values.
arXiv Detail & Related papers (2021-04-29T05:19:03Z) - Finding Global Minima via Kernel Approximations [90.42048080064849]
We consider the global minimization of smooth functions based solely on function evaluations.
In this paper, we consider an approach that jointly models the function to approximate and finds a global minimum.
arXiv Detail & Related papers (2020-12-22T12:59:30Z) - Are we Forgetting about Compositional Optimisers in Bayesian
Optimisation? [66.39551991177542]
This paper presents a sample methodology for global optimisation.
Within this, a crucial performance-determiningtrivial is maximising the acquisition function.
We highlight the empirical advantages of the approach to optimise functionation across 3958 individual experiments.
arXiv Detail & Related papers (2020-12-15T12:18:38Z) - Bayesian Optimization of Risk Measures [7.799648230758491]
We consider Bayesian optimization of objective functions of the form $rho[ F(x, W) ]$, where $F$ is a black-box expensive-to-evaluate function.
We propose a family of novel Bayesian optimization algorithms that exploit the structure of the objective function to substantially improve sampling efficiency.
arXiv Detail & Related papers (2020-07-10T18:20:46Z) - Incorporating Expert Prior in Bayesian Optimisation via Space Warping [54.412024556499254]
In big search spaces the algorithm goes through several low function value regions before reaching the optimum of the function.
One approach to subside this cold start phase is to use prior knowledge that can accelerate the optimisation.
In this paper, we represent the prior knowledge about the function optimum through a prior distribution.
The prior distribution is then used to warp the search space in such a way that space gets expanded around the high probability region of function optimum and shrinks around low probability region of optimum.
arXiv Detail & Related papers (2020-03-27T06:18:49Z) - Composition of kernel and acquisition functions for High Dimensional
Bayesian Optimization [0.1749935196721634]
We use the addition-ality of the objective function into mapping both the kernel and the acquisition function of the Bayesian Optimization.
This ap-proach makes more efficient the learning/updating of the probabilistic surrogate model.
Results are presented for real-life application, that is the control of pumps in urban water distribution systems.
arXiv Detail & Related papers (2020-03-09T15:45:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.