TS-RSR: A provably efficient approach for batch bayesian optimization
- URL: http://arxiv.org/abs/2403.04764v3
- Date: Thu, 2 May 2024 14:16:35 GMT
- Title: TS-RSR: A provably efficient approach for batch bayesian optimization
- Authors: Zhaolin Ren, Na Li,
- Abstract summary: This paper presents a new approach for batch Bayesian Optimization (BO) called Thompson Sampling-Regret to Sigma Ratio directed sampling.
Our sampling objective is able to coordinate the actions chosen in each batch in a way that minimizes redundancy between points.
We demonstrate that our method attains state-of-the-art performance on a range of challenging synthetic and realistic test functions.
- Score: 4.622871908358325
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a new approach for batch Bayesian Optimization (BO) called Thompson Sampling-Regret to Sigma Ratio directed sampling (TS-RSR), where we sample a new batch of actions by minimizing a Thompson Sampling approximation of a regret to uncertainty ratio. Our sampling objective is able to coordinate the actions chosen in each batch in a way that minimizes redundancy between points whilst focusing on points with high predictive means or high uncertainty. Theoretically, we provide rigorous convergence guarantees on our algorithm's regret, and numerically, we demonstrate that our method attains state-of-the-art performance on a range of challenging synthetic and realistic test functions, where it outperforms several competitive benchmark batch BO algorithms.
Related papers
- Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment [81.84950252537618]
This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment.
We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
arXiv Detail & Related papers (2024-10-28T04:47:39Z) - Batch Ensemble for Variance Dependent Regret in Stochastic Bandits [41.95653110232677]
Efficiently trading off exploration and exploitation is one of the key challenges in online Reinforcement Learning (RL)
Inspired by practical ensemble methods, in this work we propose a simple and novel batch ensemble scheme that achieves near-optimal regret for Multi-Armed Bandits (MAB)
Our algorithm has just a single parameter namely the number of batches, and its value does not depend on distributional properties such as the scale and variance of the losses.
arXiv Detail & Related papers (2024-09-13T06:40:56Z) - Robust Stochastic Optimization via Gradient Quantile Clipping [6.2844649973308835]
We introduce a quant clipping strategy for Gradient Descent (SGD)
We use gradient new outliers as norm clipping chains.
We propose an implementation of the algorithm using Huberiles.
arXiv Detail & Related papers (2023-09-29T15:24:48Z) - Learning Rate Free Sampling in Constrained Domains [21.853333421463603]
We introduce a suite of new particle-based algorithms for sampling in constrained domains which are entirely learning rate free.
We demonstrate the performance of our algorithms on a range of numerical examples, including sampling from targets on the simplex.
arXiv Detail & Related papers (2023-05-24T09:31:18Z) - Optimization of Annealed Importance Sampling Hyperparameters [77.34726150561087]
Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models.
We present a parameteric AIS process with flexible intermediary distributions and optimize the bridging distributions to use fewer number of steps for sampling.
We assess the performance of our optimized AIS for marginal likelihood estimation of deep generative models and compare it to other estimators.
arXiv Detail & Related papers (2022-09-27T07:58:25Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Variational Refinement for Importance Sampling Using the Forward
Kullback-Leibler Divergence [77.06203118175335]
Variational Inference (VI) is a popular alternative to exact sampling in Bayesian inference.
Importance sampling (IS) is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures.
We propose a novel combination of optimization and sampling techniques for approximate Bayesian inference.
arXiv Detail & Related papers (2021-06-30T11:00:24Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Experimental Design for Regret Minimization in Linear Bandits [19.8309784360219]
We propose a novel design-based algorithm to minimize regret in online linear and bandits.
We provide state-of-the-art finite time regret guarantees and show that our algorithm can be applied in both the bandit and semi-bandit feedback regime.
arXiv Detail & Related papers (2020-11-01T17:59:19Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.