No-Regret Linear Bandits beyond Realizability
- URL: http://arxiv.org/abs/2302.13252v2
- Date: Wed, 19 Jul 2023 23:05:45 GMT
- Title: No-Regret Linear Bandits beyond Realizability
- Authors: Chong Liu, Ming Yin, Yu-Xiang Wang
- Abstract summary: We study linear bandits when the underlying reward function is not linear.
Existing work relies on a uniform misspecification parameter $epsilon$ that measures the sup-norm error of the best linear approximation.
We describe a more natural model of misspecification which only requires the approximation error at each input $x$ to be proportional to the suboptimality gap at $x$.
- Score: 34.06789803260447
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study linear bandits when the underlying reward function is not linear.
Existing work relies on a uniform misspecification parameter $\epsilon$ that
measures the sup-norm error of the best linear approximation. This results in
an unavoidable linear regret whenever $\epsilon > 0$. We describe a more
natural model of misspecification which only requires the approximation error
at each input $x$ to be proportional to the suboptimality gap at $x$. It
captures the intuition that, for optimization problems, near-optimal regions
should matter more and we can tolerate larger approximation errors in
suboptimal regions. Quite surprisingly, we show that the classical LinUCB
algorithm -- designed for the realizable case -- is automatically robust
against such gap-adjusted misspecification. It achieves a near-optimal
$\sqrt{T}$ regret for problems that the best-known regret is almost linear in
time horizon $T$. Technically, our proof relies on a novel self-bounding
argument that bounds the part of the regret due to misspecification by the
regret itself.
Related papers
- Tangential Randomization in Linear Bandits (TRAiL): Guaranteed Inference and Regret Bounds [1.03590082373586]
We propose and analyze TRAiL, a regret-optimal forced exploration algorithm for linear bandits.
TraiL ensures a $Omega(sqrtT)$ growth in the inference quality, measured via the minimum eigenvalue of the design (regressor) matrix.
We characterize an $Omega(sqrtT)$ minimax lower bound for any algorithm on the expected regret.
arXiv Detail & Related papers (2024-11-19T01:08:13Z) - Rate-Preserving Reductions for Blackwell Approachability [72.03309261614991]
Abernethy et al. (2011) showed that Blackwell approachability and no-regret learning are equivalent.
We show that it is possible to tightly reduce any approachability instance to an instance of a generalized form of regret minimization.
arXiv Detail & Related papers (2024-06-10T23:23:52Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - $(\epsilon, u)$-Adaptive Regret Minimization in Heavy-Tailed Bandits [29.966828248335972]
We study the regret minimization problem when $epsilon$ and $u$ are unknown to the learner.
AdaR-UCB is the first algorithm that enjoys regret guarantees nearly matching those of the non-adaptive heavy-tailed case.
arXiv Detail & Related papers (2023-10-04T17:11:15Z) - On the Interplay Between Misspecification and Sub-optimality Gap in
Linear Contextual Bandits [76.2262680277608]
We study linear contextual bandits in the misspecified setting, where the expected reward function can be approximated by a linear function class.
We show that our algorithm enjoys the same gap-dependent regret bound $tilde O (d2/Delta)$ as in the well-specified setting up to logarithmic factors.
arXiv Detail & Related papers (2023-03-16T15:24:29Z) - Pseudonorm Approachability and Applications to Regret Minimization [73.54127663296906]
We convert high-dimensional $ell_infty$-approachability problems to low-dimensional pseudonorm approachability problems.
We develop an algorithmic theory of pseudonorm approachability, analogous to previous work on approachability for $ell$ and other norms.
arXiv Detail & Related papers (2023-02-03T03:19:14Z) - First-Order Regret in Reinforcement Learning with Linear Function
Approximation: A Robust Estimation Approach [57.570201404222935]
We show that it is possible to obtain regret scaling as $mathcalO(sqrtV_1star K)$ in reinforcement learning with large state spaces.
We demonstrate that existing techniques based on at least squares estimation are insufficient to obtain this result.
arXiv Detail & Related papers (2021-12-07T00:29:57Z) - Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits [45.43968161616453]
We study the optimal batch-regret tradeoff for batch linear contextual bandits.
We prove its regret guarantee, which features a two-phase expression as the time horizon $T$ grows.
We also prove a new matrix inequality concentration with dependence on their dynamic upper bounds.
arXiv Detail & Related papers (2021-10-15T12:32:33Z) - Adapting to Misspecification in Contextual Bandits [82.55565343668246]
We introduce a new family of oracle-efficient algorithms for $varepsilon$-misspecified contextual bandits.
We obtain the first algorithm that achieves the optimal $O(dsqrtT + varepsilonsqrtdT)$ regret bound for unknown misspecification level.
arXiv Detail & Related papers (2021-07-12T21:30:41Z) - Efficient and Robust Algorithms for Adversarial Linear Contextual
Bandits [13.32560004325655]
We consider an adversarial variant of the classic $K$-armed linear contextual bandit problem.
Under the assumption that the $d$-dimensional contexts are generated i.i.d.at random, we develop efficient algorithms based on the classic Exp3 algorithm.
arXiv Detail & Related papers (2020-02-01T22:49:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.