Towards Automatic Bayesian Optimization: A first step involving
acquisition functions
- URL: http://arxiv.org/abs/2003.09643v2
- Date: Tue, 12 Jan 2021 15:48:37 GMT
- Title: Towards Automatic Bayesian Optimization: A first step involving
acquisition functions
- Authors: Eduardo C. Garrido Merch\'an, Luis C. Jariego P\'erez
- Abstract summary: Bayesian optimization is the state of the art technique for the optimization of black boxes, i.e., functions where we do not have access to their analytical expression.
We propose a first attempt over automatic bayesian optimization by exploring several techniques that automatically tune the acquisition function.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bayesian Optimization is the state of the art technique for the optimization
of black boxes, i.e., functions where we do not have access to their analytical
expression nor its gradients, they are expensive to evaluate and its evaluation
is noisy. The most popular application of bayesian optimization is the
automatic hyperparameter tuning of machine learning algorithms, where we obtain
the best configuration of machine learning algorithms by optimizing the
estimation of the generalization error of these algorithms. Despite being
applied with success, bayesian optimization methodologies also have
hyperparameters that need to be configured such as the probabilistic surrogate
model or the acquisition function used. A bad decision over the configuration
of these hyperparameters implies obtaining bad quality results. Typically,
these hyperparameters are tuned by making assumptions of the objective function
that we want to evaluate but there are scenarios where we do not have any prior
information about the objective function. In this paper, we propose a first
attempt over automatic bayesian optimization by exploring several heuristics
that automatically tune the acquisition function of bayesian optimization. We
illustrate the effectiveness of these heurisitcs in a set of benchmark problems
and a hyperparameter tuning problem of a machine learning algorithm.
Related papers
- End-to-End Learning for Fair Multiobjective Optimization Under
Uncertainty [55.04219793298687]
The Predict-Then-Forecast (PtO) paradigm in machine learning aims to maximize downstream decision quality.
This paper extends the PtO methodology to optimization problems with nondifferentiable Ordered Weighted Averaging (OWA) objectives.
It shows how optimization of OWA functions can be effectively integrated with parametric prediction for fair and robust optimization under uncertainty.
arXiv Detail & Related papers (2024-02-12T16:33:35Z) - Fine-Tuning Adaptive Stochastic Optimizers: Determining the Optimal Hyperparameter $ε$ via Gradient Magnitude Histogram Analysis [0.7366405857677226]
We introduce a new framework based on the empirical probability density function of the loss's magnitude, termed the "gradient magnitude histogram"
We propose a novel algorithm using gradient magnitude histograms to automatically estimate a refined and accurate search space for the optimal safeguard.
arXiv Detail & Related papers (2023-11-20T04:34:19Z) - Generalizing Bayesian Optimization with Decision-theoretic Entropies [102.82152945324381]
We consider a generalization of Shannon entropy from work in statistical decision theory.
We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures.
We then show how alternative choices for the loss yield a flexible family of acquisition functions.
arXiv Detail & Related papers (2022-10-04T04:43:58Z) - Optimizing Large-Scale Hyperparameters via Automated Learning Algorithm [97.66038345864095]
We propose a new hyperparameter optimization method with zeroth-order hyper-gradients (HOZOG)
Specifically, we first formulate hyperparameter optimization as an A-based constrained optimization problem.
Then, we use the average zeroth-order hyper-gradients to update hyper parameters.
arXiv Detail & Related papers (2021-02-17T21:03:05Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z) - A Gradient-based Bilevel Optimization Approach for Tuning
Hyperparameters in Machine Learning [0.0]
We propose a bilevel solution method for solving the hyperparameter optimization problem.
The proposed method is general and can be easily applied to any class of machine learning algorithms.
We discuss the theory behind the proposed algorithm and perform extensive computational study on two datasets.
arXiv Detail & Related papers (2020-07-21T18:15:08Z) - BOSH: Bayesian Optimization by Sampling Hierarchically [10.10241176664951]
We propose a novel BO routine pairing a hierarchical Gaussian process with an information-theoretic framework to generate a growing pool of realizations.
We demonstrate that BOSH provides more efficient and higher-precision optimization than standard BO across synthetic benchmarks, simulation optimization, reinforcement learning and hyper- parameter tuning tasks.
arXiv Detail & Related papers (2020-07-02T07:35:49Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Incorporating Expert Prior in Bayesian Optimisation via Space Warping [54.412024556499254]
In big search spaces the algorithm goes through several low function value regions before reaching the optimum of the function.
One approach to subside this cold start phase is to use prior knowledge that can accelerate the optimisation.
In this paper, we represent the prior knowledge about the function optimum through a prior distribution.
The prior distribution is then used to warp the search space in such a way that space gets expanded around the high probability region of function optimum and shrinks around low probability region of optimum.
arXiv Detail & Related papers (2020-03-27T06:18:49Z) - Composition of kernel and acquisition functions for High Dimensional
Bayesian Optimization [0.1749935196721634]
We use the addition-ality of the objective function into mapping both the kernel and the acquisition function of the Bayesian Optimization.
This ap-proach makes more efficient the learning/updating of the probabilistic surrogate model.
Results are presented for real-life application, that is the control of pumps in urban water distribution systems.
arXiv Detail & Related papers (2020-03-09T15:45:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.