Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds
- URL: http://arxiv.org/abs/2505.23673v1
- Date: Thu, 29 May 2025 17:17:29 GMT
- Title: Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds
- Authors: Aya Kayal, Sattar Vakili, Laura Toni, Da-shan Shiu, Alberto Bernacchia,
- Abstract summary: We refer to this problem as Bayesian Optimization from Human Feedback (HF), which differs from the best actions from a reduced feedback model.<n>The objective is to identify the best action using a limited preference framework.<n>In other words, the same number of preferential emerging samples as scalar-valued samples is sufficient to find a nearly optimal solution.
- Score: 20.024434010891433
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Bayesian optimization (BO) with preference-based feedback has recently garnered significant attention due to its emerging applications. We refer to this problem as Bayesian Optimization from Human Feedback (BOHF), which differs from conventional BO by learning the best actions from a reduced feedback model, where only the preference between two actions is revealed to the learner at each time step. The objective is to identify the best action using a limited number of preference queries, typically obtained through costly human feedback. Existing work, which adopts the Bradley-Terry-Luce (BTL) feedback model, provides regret bounds for the performance of several algorithms. In this work, within the same framework we develop tighter performance guarantees. Specifically, we derive regret bounds of $\tilde{\mathcal{O}}(\sqrt{\Gamma(T)T})$, where $\Gamma(T)$ represents the maximum information gain$\unicode{x2014}$a kernel-specific complexity term$\unicode{x2014}$and $T$ is the number of queries. Our results significantly improve upon existing bounds. Notably, for common kernels, we show that the order-optimal sample complexities of conventional BO$\unicode{x2014}$achieved with richer feedback models$\unicode{x2014}$are recovered. In other words, the same number of preferential samples as scalar-valued samples is sufficient to find a nearly optimal solution.
Related papers
- Simplifying Bayesian Optimization Via In-Context Direct Optimum Sampling [18.852567298468742]
We propose a completely in-context zero-shot solution for BO that does not require surrogate fitting or acquisition function optimization.<n>This is done by using a pre-trained context model to directly sample from the posterior over the optimum point.<n>We achieve an efficiency gain of more than 35x in terms of wall-clock time when compared with process-based BO.
arXiv Detail & Related papers (2025-05-29T18:07:36Z) - Markov Chain-based Optimization Time Analysis of Bivalent Ant Colony Optimization for Sorting and LeadingOnes [0.0]
We show that the ratio of the two pheromone values significantly governs the runtime behavior of Bivalent ACO (BACO)
We show that despite we have a drastically simplified ant algorithm with respect to the influence of the pheromones on the solving process, known bounds on the expected optimization time for the problems OneMax ($O(nlog n)$) and LeadingOnes ($O(n2)$) can be re-produced as a by-product of our approach.
arXiv Detail & Related papers (2024-05-06T11:02:50Z) - Towards Efficient and Optimal Covariance-Adaptive Algorithms for Combinatorial Semi-Bandits [12.674929126684528]
We address the problem of semi-bandits, where a player selects among P actions from the power set of a set containing d base items.
We show that our approach efficiently leverages the semi-bandit feedback and outperforms bandit feedback approaches.
arXiv Detail & Related papers (2024-02-23T08:07:54Z) - Exploration by Optimization with Hybrid Regularizers: Logarithmic Regret with Adversarial Robustness in Partial Monitoring [41.20252349191698]
Partial monitoring is a generic framework of online decision-making problems with limited feedback.<n>We present a new framework and analysis for ExO with a hybrid regularizer.<n>In particular, we derive a regret bound of $O(sum_a neq a*)$, where $k$, $m$, and $T$ are the numbers of actions, observations and rounds.
arXiv Detail & Related papers (2024-02-13T09:34:22Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Multi-block-Single-probe Variance Reduced Estimator for Coupled
Compositional Optimization [49.58290066287418]
We propose a novel method named Multi-block-probe Variance Reduced (MSVR) to alleviate the complexity of compositional problems.
Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on strongity.
arXiv Detail & Related papers (2022-07-18T12:03:26Z) - Introducing the Expohedron for Efficient Pareto-optimal Fairness-Utility
Amortizations in Repeated Rankings [9.066817876491053]
We consider the problem of computing a sequence of rankings that maximizes consumer-side utility while minimizing producer-side individual unfairness of exposure.
We introduce a geometrical object, a polytope that we call expohedron, whose points represent all achievable exposures of items for a Position Based Model.
Our approach compares favorably to linear or quadratic programming baselines in terms of algorithmic complexity and empirical runtime.
Our solution can be expressed as a distribution over only $n$ permutations, instead of the $(n-1)2 + 1$ achieved with BvN decompositions.
arXiv Detail & Related papers (2022-02-07T14:43:35Z) - Gaussian Process Bandit Optimization with Few Batches [49.896920704012395]
We introduce a batch algorithm inspired by finite-arm bandit algorithms.
We show that it achieves the cumulative regret upper bound $Oast(sqrtTgamma_T)$ using $O(loglog T)$ batches within time horizon $T$.
In addition, we propose a modified version of our algorithm, and characterize how the regret is impacted by the number of batches.
arXiv Detail & Related papers (2021-10-15T00:54:04Z) - Bayesian Optimistic Optimisation with Exponentially Decaying Regret [58.02542541410322]
The current practical BO algorithms have regret bounds ranging from $mathcalO(fraclogNsqrtN)$ to $mathcal O(e-sqrtN)$, where $N$ is the number of evaluations.
This paper explores the possibility of improving the regret bound in the noiseless setting by intertwining concepts from BO and tree-based optimistic optimisation.
We propose the BOO algorithm, a first practical approach which can achieve an exponential regret bound with order $mathcal O(N-sqrt
arXiv Detail & Related papers (2021-05-10T13:07:44Z) - Bayesian Algorithm Execution: Estimating Computable Properties of
Black-box Functions Using Mutual Information [78.78486761923855]
In many real world problems, we want to infer some property of an expensive black-box function f, given a budget of T function evaluations.
We present a procedure, InfoBAX, that sequentially chooses queries that maximize mutual information with respect to the algorithm's output.
On these problems, InfoBAX uses up to 500 times fewer queries to f than required by the original algorithm.
arXiv Detail & Related papers (2021-04-19T17:22:11Z) - Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits
with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints.
For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.