Related papers: Nonparametric Bayesian Optimization for General Rewards

Nonparametric Bayesian Optimization for General Rewards

URL: http://arxiv.org/abs/2602.07411v1
Date: Sat, 07 Feb 2026 07:01:33 GMT
Title: Nonparametric Bayesian Optimization for General Rewards
Authors: Zishi Zhang, Tao Ren, Yijie Peng,
Abstract summary: We propose the first BO algorithm that achieves no-regret guarantee in a general reward setting, requiring only Lipschitz continuity of the objective function.<n>We develop a new TS regret analysis framework for general rewards, which relates the regret to the total variation distance between the surrogate model and the true reward distribution.<n> Empirical results demonstrate state-of-the-art performance, particularly in settings with non-stationary, heavy-tailed, or other ill-conditioned rewards.
Score: 4.696963700743491
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work focuses on Bayesian optimization (BO) under reward model uncertainty. We propose the first BO algorithm that achieves no-regret guarantee in a general reward setting, requiring only Lipschitz continuity of the objective function and accommodating a broad class of measurement noise. The core of our approach is a novel surrogate model, termed as infinite Gaussian process ($\infty$-GP). It is a Bayesian nonparametric model that places a prior on the space of reward distributions, enabling it to represent a substantially broader class of reward models than classical Gaussian process (GP). The $\infty$-GP is used in combination with Thompson Sampling (TS) to enable effective exploration and exploitation. Correspondingly, we develop a new TS regret analysis framework for general rewards, which relates the regret to the total variation distance between the surrogate model and the true reward distribution. Furthermore, with a truncated Gibbs sampling procedure, our method is computationally scalable, incurring minimal additional memory and computational complexities compared to classical GP. Empirical results demonstrate state-of-the-art performance, particularly in settings with non-stationary, heavy-tailed, or other ill-conditioned rewards.

Related papers

Thompson Sampling via Fine-Tuning of LLMs [68.1722422968855]
We propose an alternative based on Thompson sampling that eliminates the need for scalable large acquisition functions.<n>Our approach Thompson Sampling via Finening (ToSFiT) leverages the prior knowledge embedded in prompt-conditioned language models, and adapts incrementally toward the posterior.<n>Our analysis reveals the critical role of careful adaptation to the posterior probability of maximality-a principle that underpins our ToSFiT algorithm.
arXiv Detail & Related papers (2025-10-15T09:13:59Z)
Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z)
Walking the Values in Bayesian Inverse Reinforcement Learning [66.68997022043075]
Key challenge in Bayesian IRL is bridging the computational gap between the hypothesis space of possible rewards and the likelihood. We propose ValueWalk - a new Markov chain Monte Carlo method based on this insight.
arXiv Detail & Related papers (2024-07-15T17:59:52Z)
Towards Practical Preferential Bayesian Optimization with Skew Gaussian Processes [8.198195852439946]
We study preferential Bayesian optimization (BO) where reliable feedback is limited to pairwise comparison called duels. An important challenge in preferential BO, which uses the preferential Gaussian process (GP) model to represent flexible preference structure, is that the posterior distribution is a computationally intractable skew GP. We develop a new method that achieves both high computational efficiency and low sample complexity, and then demonstrate its effectiveness through extensive numerical experiments.
arXiv Detail & Related papers (2023-02-03T03:02:38Z)
Surrogate modeling for Bayesian optimization beyond a single Gaussian process [62.294228304646516]
We propose a novel Bayesian surrogate model to balance exploration with exploitation of the search space. To endow function sampling with scalability, random feature-based kernel approximation is leveraged per GP model. To further establish convergence of the proposed EGP-TS to the global optimum, analysis is conducted based on the notion of Bayesian regret.
arXiv Detail & Related papers (2022-05-27T16:43:10Z)
Global convergence of optimized adaptive importance samplers [0.0]
We analyze the optimized adaptive importance sampler (OAIS) for performing Monte Carlo integration with general proposals. We derive nonasymptotic bounds for the global gradient of $chi2$-divergence for proposals.
arXiv Detail & Related papers (2022-01-02T19:56:36Z)
Misspecified Gaussian Process Bandit Optimization [59.30399661155574]
Kernelized bandit algorithms have shown strong empirical and theoretical performance for this problem. We introduce a emphmisspecified kernelized bandit setting where the unknown function can be $epsilon$--uniformly approximated by a function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS) We show that our algorithm achieves optimal dependence on $epsilon$ with no prior knowledge of misspecification.
arXiv Detail & Related papers (2021-11-09T09:00:02Z)
Approximate Bayesian inference from noisy likelihoods with Gaussian process emulated MCMC [0.24275655667345403]
We model the log-likelihood function using a Gaussian process (GP) The main methodological innovation is to apply this model to emulate the progression that an exact Metropolis-Hastings (MH) sampler would take. The resulting approximate sampler is conceptually simple and sample-efficient.
arXiv Detail & Related papers (2021-04-08T17:38:02Z)
Likelihood-Free Inference with Deep Gaussian Processes [70.74203794847344]
Surrogate models have been successfully used in likelihood-free inference to decrease the number of simulator evaluations. We propose a Deep Gaussian Process (DGP) surrogate model that can handle more irregularly behaved target distributions. Our experiments show how DGPs can outperform GPs on objective functions with multimodal distributions and maintain a comparable performance in unimodal cases.
arXiv Detail & Related papers (2020-06-18T14:24:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.