Related papers: Optimal Regret Is Achievable with Bounded Approximate Inference Error: An Enhanced Bayesian Upper Confidence Bound Framework

Optimal Regret Is Achievable with Bounded Approximate Inference Error: An Enhanced Bayesian Upper Confidence Bound Framework

URL: http://arxiv.org/abs/2201.12955v4
Date: Fri, 10 Nov 2023 03:01:23 GMT
Title: Optimal Regret Is Achievable with Bounded Approximate Inference Error: An Enhanced Bayesian Upper Confidence Bound Framework
Authors: Ziyi Huang, Henry Lam, Amirhossein Meisami, Haofeng Zhang
Abstract summary: We propose an Enhanced Bayesian Upper Confidence Bound (EBUCB) framework that can efficiently accommodate bandit problems. We show that EBUCB can achieve the optimal regret order $O(log T)$ if the inference error measured by two different $alpha$-divergences is less than a constant. Our study provides the first theoretical regret bound that is better than $o(T)$ in the setting of constant approximate inference error.
Score: 22.846260353176614
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Bayesian bandit algorithms with approximate Bayesian inference have been widely used in real-world applications. However, there is a large discrepancy between the superior practical performance of these approaches and their theoretical justification. Previous research only indicates a negative theoretical result: Thompson sampling could have a worst-case linear regret $\Omega(T)$ with a constant threshold on the inference error measured by one $\alpha$-divergence. To bridge this gap, we propose an Enhanced Bayesian Upper Confidence Bound (EBUCB) framework that can efficiently accommodate bandit problems in the presence of approximate inference. Our theoretical analysis demonstrates that for Bernoulli multi-armed bandits, EBUCB can achieve the optimal regret order $O(\log T)$ if the inference error measured by two different $\alpha$-divergences is less than a constant, regardless of how large this constant is. To our best knowledge, our study provides the first theoretical regret bound that is better than $o(T)$ in the setting of constant approximate inference error. Furthermore, in concordance with the negative results in previous studies, we show that only one bounded $\alpha$-divergence is insufficient to guarantee a sub-linear regret.

Related papers

Regret Minimization and Statistical Inference in Online Decision Making with High-dimensional Covariates [7.21848268647674]
We integrate the $varepsilon$-greedy bandit algorithm for decision-making with a hard thresholding algorithm for estimating sparse bandit parameters. Under a margin condition, our method achieves either $O(T1/2)$ regret or classical $O(T1/2)$-consistent inference.
arXiv Detail & Related papers (2024-11-10T01:47:11Z)
Bayesian Bandit Algorithms with Approximate Inference in Stochastic Linear Bandits [21.09844002135398]
We show that Linear Thompson sampling (LinTS) and the extension of Bayesian Upper Confidence Bound (LinBUCB) can preserve their original rates of regret upper bound. We also show that LinBUCB expedites the regret rate of LinTS from $tildeO(d3/2sqrtT)$ to $tildeO(dsqrtT)$ matching the minimax optimal rate.
arXiv Detail & Related papers (2024-06-20T07:45:38Z)
Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM) We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$. Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z)
Pseudonorm Approachability and Applications to Regret Minimization [73.54127663296906]
We convert high-dimensional $ell_infty$-approachability problems to low-dimensional pseudonorm approachability problems. We develop an algorithmic theory of pseudonorm approachability, analogous to previous work on approachability for $ell$ and other norms.
arXiv Detail & Related papers (2023-02-03T03:19:14Z)
Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms [59.8188496313214]
We study the semi-bandits (CMAB) and focus on reducing the dependency of the batch-size $K$ in the regret bound. First, for the setting of CMAB with probabilistically triggered arms (CMAB-T), we propose a BCUCB-T algorithm with variance-aware confidence intervals. Second, for the setting of non-triggering CMAB with independent arms, we propose a SESCB algorithm which leverages on the non-triggering version of the TPVM condition.
arXiv Detail & Related papers (2022-08-31T13:09:39Z)
First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach [57.570201404222935]
We show that it is possible to obtain regret scaling as $mathcalO(sqrtV_1star K)$ in reinforcement learning with large state spaces. We demonstrate that existing techniques based on at least squares estimation are insufficient to obtain this result.
arXiv Detail & Related papers (2021-12-07T00:29:57Z)
Linear Contextual Bandits with Adversarial Corruptions [91.38793800392108]
We study the linear contextual bandit problem in the presence of adversarial corruption. We present a variance-aware algorithm that is adaptive to the level of adversarial contamination $C$.
arXiv Detail & Related papers (2021-10-25T02:53:24Z)
Sharp regret bounds for empirical Bayes and compound decision problems [42.397889421982555]
In a Bayes setting the optimal estimator is given by the prior-dependent conditional mean. We show that for the Poisson model with compactly supported and subexponential priors, the optimal regret scales as $Theta(fraclog nloglog n)2)$ and $Theta(log3 n)$.
arXiv Detail & Related papers (2021-09-08T21:34:47Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
Differentiable Linear Bandit Algorithm [6.849358422233866]
Upper Confidence Bound is arguably the most commonly used method for linear multi-arm bandit problems. We introduce a gradient estimator, which allows the confidence bound to be learned via gradient ascent. We show that the proposed algorithm achieves a $tildemathcalO(hatbetasqrtdT)$ upper bound of $T$-round regret, where $d$ is the dimension of arm features and $hatbeta$ is the learned size of confidence bound.
arXiv Detail & Related papers (2020-06-04T16:43:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.