Efficient Bayesian Optimization with Deep Kernel Learning and
Transformer Pre-trained on Multiple Heterogeneous Datasets
- URL: http://arxiv.org/abs/2308.04660v1
- Date: Wed, 9 Aug 2023 01:56:10 GMT
- Title: Efficient Bayesian Optimization with Deep Kernel Learning and
Transformer Pre-trained on Multiple Heterogeneous Datasets
- Authors: Wenlong Lyu, Shoubo Hu, Jie Chuai, Zhitang Chen
- Abstract summary: We propose a simple approach to pre-train a surrogate, which is a Gaussian process (GP) with a kernel defined on deep features learned from a Transformer-based encoder.
Experiments on both synthetic and real benchmark problems demonstrate the effectiveness of our proposed pre-training and transfer BO strategy.
- Score: 9.510327380529892
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Bayesian optimization (BO) is widely adopted in black-box optimization
problems and it relies on a surrogate model to approximate the black-box
response function. With the increasing number of black-box optimization tasks
solved and even more to solve, the ability to learn from multiple prior tasks
to jointly pre-train a surrogate model is long-awaited to further boost
optimization efficiency. In this paper, we propose a simple approach to
pre-train a surrogate, which is a Gaussian process (GP) with a kernel defined
on deep features learned from a Transformer-based encoder, using datasets from
prior tasks with possibly heterogeneous input spaces. In addition, we provide a
simple yet effective mix-up initialization strategy for input tokens
corresponding to unseen input variables and therefore accelerate new tasks'
convergence. Experiments on both synthetic and real benchmark problems
demonstrate the effectiveness of our proposed pre-training and transfer BO
strategy over existing methods.
Related papers
- Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment [81.84950252537618]
This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment.
We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
arXiv Detail & Related papers (2024-10-28T04:47:39Z) - Bayesian Inverse Transfer in Evolutionary Multiobjective Optimization [29.580786235313987]
We introduce the first Inverse Transfer Multiobjective (invTrEMO)
InvTrEMO harnesses the common objective functions in many prevalent areas, even when decision spaces do not precisely align between tasks.
InvTrEMO yields high-precision inverse models as a significant byproduct, enabling the generation of tailored solutions on-demand.
arXiv Detail & Related papers (2023-12-22T14:12:18Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - Transfer Learning for Bayesian Optimization: A Survey [29.229660973338145]
Black-box optimization is a powerful tool that models and optimize such expensive "black-box" functions.
Researchers in the BO community propose to incorporate the spirit of transfer learning to accelerate optimization process.
arXiv Detail & Related papers (2023-02-12T14:37:25Z) - A Data-Driven Evolutionary Transfer Optimization for Expensive Problems
in Dynamic Environments [9.098403098464704]
Data-driven, a.k.a. surrogate-assisted, evolutionary optimization has been recognized as an effective approach for tackling expensive black-box optimization problems.
This paper proposes a simple but effective transfer learning framework to empower data-driven evolutionary optimization to solve dynamic optimization problems.
Experiments on synthetic benchmark test problems and a real-world case study demonstrate the effectiveness of our proposed algorithm.
arXiv Detail & Related papers (2022-11-05T11:19:50Z) - An Empirical Evaluation of Zeroth-Order Optimization Methods on
AI-driven Molecule Optimization [78.36413169647408]
We study the effectiveness of various ZO optimization methods for optimizing molecular objectives.
We show the advantages of ZO sign-based gradient descent (ZO-signGD)
We demonstrate the potential effectiveness of ZO optimization methods on widely used benchmark tasks from the Guacamol suite.
arXiv Detail & Related papers (2022-10-27T01:58:10Z) - Tree ensemble kernels for Bayesian optimization with known constraints
over mixed-feature spaces [54.58348769621782]
Tree ensembles can be well-suited for black-box optimization tasks such as algorithm tuning and neural architecture search.
Two well-known challenges in using tree ensembles for black-box optimization are (i) effectively quantifying model uncertainty for exploration and (ii) optimizing over the piece-wise constant acquisition function.
Our framework performs as well as state-of-the-art methods for unconstrained black-box optimization over continuous/discrete features and outperforms competing methods for problems combining mixed-variable feature spaces and known input constraints.
arXiv Detail & Related papers (2022-07-02T16:59:37Z) - Towards Learning Universal Hyperparameter Optimizers with Transformers [57.35920571605559]
We introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction.
Our experiments demonstrate that the OptFormer can imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates.
arXiv Detail & Related papers (2022-05-26T12:51:32Z) - Outlier-Robust Sparse Estimation via Non-Convex Optimization [73.18654719887205]
We explore the connection between high-dimensional statistics and non-robust optimization in the presence of sparsity constraints.
We develop novel and simple optimization formulations for these problems.
As a corollary, we obtain that any first-order method that efficiently converges to station yields an efficient algorithm for these tasks.
arXiv Detail & Related papers (2021-09-23T17:38:24Z) - Few-Shot Bayesian Optimization with Deep Kernel Surrogates [7.208515071018781]
We propose a few-shot learning problem in which we train a shared deep surrogate model to adapt to the response function of a new task.
We propose the use of a deep kernel network for a Gaussian process surrogate that is meta-learned in an end-to-end fashion.
As a result, the novel few-shot optimization of our deep kernel surrogate leads to new state-of-the-art results at HPO.
arXiv Detail & Related papers (2021-01-19T15:00:39Z) - Federated Bayesian Optimization via Thompson Sampling [33.087439644066876]
This paper presents federated Thompson sampling (FTS) which overcomes a number of key challenges of FBO and FL in a principled way.
We empirically demonstrate the effectiveness of FTS in terms of communication efficiency, computational efficiency, and practical performance.
arXiv Detail & Related papers (2020-10-20T09:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.