Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits
- URL: http://arxiv.org/abs/2110.08057v1
- Date: Fri, 15 Oct 2021 12:32:33 GMT
- Title: Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits
- Authors: Zihan Zhang, Xiangyang Ji, Yuan Zhou
- Abstract summary: We study the optimal batch-regret tradeoff for batch linear contextual bandits.
We prove its regret guarantee, which features a two-phase expression as the time horizon $T$ grows.
We also prove a new matrix inequality concentration with dependence on their dynamic upper bounds.
- Score: 45.43968161616453
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the optimal batch-regret tradeoff for batch linear contextual
bandits. For any batch number $M$, number of actions $K$, time horizon $T$, and
dimension $d$, we provide an algorithm and prove its regret guarantee, which,
due to technical reasons, features a two-phase expression as the time horizon
$T$ grows. We also prove a lower bound theorem that surprisingly shows the
optimality of our two-phase regret upper bound (up to logarithmic factors) in
the \emph{full range} of the problem parameters, therefore establishing the
exact batch-regret tradeoff.
Compared to the recent work \citep{ruan2020linear} which showed that $M =
O(\log \log T)$ batches suffice to achieve the asymptotically minimax-optimal
regret without the batch constraints, our algorithm is simpler and easier for
practical implementation. Furthermore, our algorithm achieves the optimal
regret for all $T \geq d$, while \citep{ruan2020linear} requires that $T$
greater than an unrealistically large polynomial of $d$.
Along our analysis, we also prove a new matrix concentration inequality with
dependence on their dynamic upper bounds, which, to the best of our knowledge,
is the first of its kind in literature and maybe of independent interest.
Related papers
- Optimal and Practical Batched Linear Bandit Algorithm [8.087699764574788]
We study the linear bandit problem under limited adaptivity, known as the batched linear bandit.<n>We propose textttBLAE, a novel batched algorithm that integrates arm elimination with regularized G-optimal design.<n>Our analysis introduces new techniques for batch-wise optimal design and refined concentration bounds.
arXiv Detail & Related papers (2025-07-11T09:29:28Z) - p-Mean Regret for Stochastic Bandits [52.828710025519996]
We introduce a simple, unified UCB-based algorithm that achieves novel $p$-mean regret bounds.
Our framework encompasses both average cumulative regret and Nash regret as special cases.
arXiv Detail & Related papers (2024-12-14T08:38:26Z) - Tangential Randomization in Linear Bandits (TRAiL): Guaranteed Inference and Regret Bounds [1.03590082373586]
We propose and analyze TRAiL, a regret-optimal forced exploration algorithm for linear bandits.
TraiL ensures a $Omega(sqrtT)$ growth in the inference quality, measured via the minimum eigenvalue of the design (regressor) matrix.
We characterize an $Omega(sqrtT)$ minimax lower bound for any algorithm on the expected regret.
arXiv Detail & Related papers (2024-11-19T01:08:13Z) - Optimal Batched Linear Bandits [9.795531604467406]
We introduce the E$4$ algorithm for the batched linear bandit problem, incorporating an Explore-Estimate-Exploit framework.
With a proper choice of exploration rate, we prove E$4$ achieves finite--time minimax optimal regret with only $O(loglog T)$ batches.
We show that with another choice of exploration rate E$4$ achieves an instance-dependent regret bound requiring at most $O(log T)$ batches.
arXiv Detail & Related papers (2024-06-06T14:57:52Z) - LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits [38.41164102066483]
This study considers the linear contextual bandit problem with independent and identically distributed contexts.
Our proposed algorithm is based on the Follow-The-Regularized-Leader with the Tsallis entropy and referred to as the $alpha$-textual-Con (LC)-Tsallis-INF.
arXiv Detail & Related papers (2024-03-05T18:59:47Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Parameter-free Regret in High Probability with Heavy Tails [45.11667443216861]
We present new algorithms for online convex optimization over unbounded domains.
We obtain parameter-free regret in high-probability given access only to potentially heavy-tailed subgradient estimates.
arXiv Detail & Related papers (2022-10-25T21:43:44Z) - Nearly Minimax Algorithms for Linear Bandits with Shared Representation [86.79657561369397]
We consider the setting where we play $M$ linear bandits with dimension $d$, each for $T$ rounds, and these $M$ bandit tasks share a common $k(ll d)$ dimensional linear representation.
We come up with novel algorithms that achieve $widetildeOleft(dsqrtkMT + kMsqrtTright)$ regret bounds, which matches the known minimax regret lower bound up to logarithmic factors.
arXiv Detail & Related papers (2022-03-29T15:27:13Z) - Gaussian Process Bandit Optimization with Few Batches [49.896920704012395]
We introduce a batch algorithm inspired by finite-arm bandit algorithms.
We show that it achieves the cumulative regret upper bound $Oast(sqrtTgamma_T)$ using $O(loglog T)$ batches within time horizon $T$.
In addition, we propose a modified version of our algorithm, and characterize how the regret is impacted by the number of batches.
arXiv Detail & Related papers (2021-10-15T00:54:04Z) - Regret Lower Bound and Optimal Algorithm for High-Dimensional Contextual
Linear Bandit [10.604939762790517]
We prove a minimax lower bound, $mathcalObig((log d)fracalpha+12Tfrac1-alpha2+log Tbig)$, for the cumulative regret.
Second, we propose a simple and computationally efficient algorithm inspired by the general Upper Confidence Bound (UCB) strategy.
arXiv Detail & Related papers (2021-09-23T19:35:38Z) - Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits
with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints.
For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z) - Thresholded Lasso Bandit [70.17389393497125]
Thresholded Lasso bandit is an algorithm that estimates the vector defining the reward function as well as its sparse support.
We establish non-asymptotic regret upper bounds scaling as $mathcalO( log d + sqrtT )$ in general, and as $mathcalO( log d + sqrtT )$ under the so-called margin condition.
arXiv Detail & Related papers (2020-10-22T19:14:37Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.