Related papers: Predicting from Strings: Language Model Embeddings for Bayesian Optimization

Predicting from Strings: Language Model Embeddings for Bayesian Optimization

URL: http://arxiv.org/abs/2410.10190v2
Date: Tue, 15 Oct 2024 17:23:08 GMT
Title: Predicting from Strings: Language Model Embeddings for Bayesian Optimization
Authors: Tung Nguyen, Qiuyi Zhang, Bangding Yang, Chansoo Lee, Jorg Bornschein, Yingjie Miao, Sagi Perel, Yutian Chen, Xingyou Song,
Abstract summary: We propose Embed-then-Regress, a paradigm for applying in-context regression over string inputs. By expressing all inputs as strings we are able to perform general-purpose regression for Optimization over various domains.
Score: 21.370382766970877
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Bayesian Optimization is ubiquitous in the field of experimental design and blackbox optimization for improving search efficiency, but has been traditionally restricted to regression models which are only applicable to fixed search spaces and tabular input features. We propose Embed-then-Regress, a paradigm for applying in-context regression over string inputs, through the use of string embedding capabilities of pretrained language models. By expressing all inputs as strings, we are able to perform general-purpose regression for Bayesian Optimization over various domains including synthetic, combinatorial, and hyperparameter optimization, obtaining comparable results to state-of-the-art Gaussian Process-based algorithms. Code can be found at https://github.com/google-research/optformer/tree/main/optformer/embed_then_regress.

Related papers

Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest. Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree. We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z)
Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression. Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates. The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z)
Surrogate modeling for Bayesian optimization beyond a single Gaussian process [62.294228304646516]
We propose a novel Bayesian surrogate model to balance exploration with exploitation of the search space. To endow function sampling with scalability, random feature-based kernel approximation is leveraged per GP model. To further establish convergence of the proposed EGP-TS to the global optimum, analysis is conducted based on the notion of Bayesian regret.
arXiv Detail & Related papers (2022-05-27T16:43:10Z)
Towards Learning Universal Hyperparameter Optimizers with Transformers [57.35920571605559]
We introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction. Our experiments demonstrate that the OptFormer can imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates.
arXiv Detail & Related papers (2022-05-26T12:51:32Z)
Scalable Bayesian Optimization Using Vecchia Approximations of Gaussian Processes [0.0]
We adapt the Vecchia approximation, a popular GP approximation from spatial statistics, to enable scalable high-dimensional Bayesian optimization. We focus on the use of our warped Vecchia GP in trust-region Bayesian optimization via Thompson sampling.
arXiv Detail & Related papers (2022-03-02T23:55:14Z)
Fourier Representations for Black-Box Optimization over Categorical Variables [34.0277529502051]
We propose to use existing methods in conjunction with a surrogate model for the black-box evaluations over purely categorical variables. To learn such representations, we consider two different settings to update our surrogate model. Numerical experiments over synthetic benchmarks as well as real-world RNA sequence optimization and design problems demonstrate the representational power of the proposed methods.
arXiv Detail & Related papers (2022-02-08T08:14:58Z)
Triangulation candidates for Bayesian optimization [0.3222802562733786]
Bayesian optimization is a form of design to idealize input-output relationships with a suitably flexible regression model. Here we propose using candidates based a Delaunay triangulation, based on a simple conventional convex library.
arXiv Detail & Related papers (2021-12-14T15:13:31Z)
Text Counterfactuals via Latent Optimization and Shapley-Guided Search [15.919650185010491]
We study the problem of generating counterfactual text for a classification model. We aim to minimally alter the text to change the model's prediction. White-box approaches have been successfully applied to similar problems in vision.
arXiv Detail & Related papers (2021-10-22T05:04:40Z)
An AI-Assisted Design Method for Topology Optimization Without Pre-Optimized Training Data [68.8204255655161]
An AI-assisted design method based on topology optimization is presented, which is able to obtain optimized designs in a direct way. Designs are provided by an artificial neural network, the predictor, on the basis of boundary conditions and degree of filling as input data.
arXiv Detail & Related papers (2020-12-11T14:33:27Z)
BOSS: Bayesian Optimization over String Spaces [15.630421177117634]
This article develops a Bayesian optimization (BO) method which acts directly over raw strings. It proposes the first uses of string kernels and genetic algorithms within BO loops.
arXiv Detail & Related papers (2020-10-02T13:18:27Z)
Global Optimization of Gaussian processes [52.77024349608834]
We propose a reduced-space formulation with trained Gaussian processes trained on few data points. The approach also leads to significantly smaller and computationally cheaper sub solver for lower bounding. In total, we reduce time convergence by orders of orders of the proposed method.
arXiv Detail & Related papers (2020-05-21T20:59:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.