Batched Nonparametric Contextual Bandits
- URL: http://arxiv.org/abs/2402.17732v2
- Date: Mon, 10 Jun 2024 21:10:00 GMT
- Title: Batched Nonparametric Contextual Bandits
- Authors: Rong Jiang, Cong Ma,
- Abstract summary: We study nonparametric contextual bandits under batch constraints.
We propose a novel batch learning algorithm that achieves the optimal regret.
Our theoretical results suggest that for nonparametric contextual bandits, a nearly constant number of policy updates can attain optimal regret.
- Score: 21.031965676746776
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study nonparametric contextual bandits under batch constraints, where the expected reward for each action is modeled as a smooth function of covariates, and the policy updates are made at the end of each batch of observations. We establish a minimax regret lower bound for this setting and propose a novel batch learning algorithm that achieves the optimal regret (up to logarithmic factors). In essence, our procedure dynamically splits the covariate space into smaller bins, carefully aligning their widths with the batch size. Our theoretical results suggest that for nonparametric contextual bandits, a nearly constant number of policy updates can attain optimal regret in the fully online setting.
Related papers
- Proportional Response: Contextual Bandits for Simple and Cumulative
Regret Minimization [29.579719765255927]
We propose a new family of efficient bandit algorithms for the contextual bandit setting.
Our algorithms work with any function class, are robust to model misspecification, and can be used in continuous arm settings.
arXiv Detail & Related papers (2023-07-05T08:34:54Z) - Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits [55.03293214439741]
In contextual bandits, an agent sequentially makes actions from a time-dependent action set based on past experience.
We propose the first online continuous hyperparameter tuning framework for contextual bandits.
We show that it could achieve a sublinear regret in theory and performs consistently better than all existing methods on both synthetic and real datasets.
arXiv Detail & Related papers (2023-02-18T23:31:20Z) - Autoregressive Bandits [58.46584210388307]
We propose a novel online learning setting, Autoregressive Bandits, in which the observed reward is governed by an autoregressive process of order $k$.
We show that, under mild assumptions on the reward process, the optimal policy can be conveniently computed.
We then devise a new optimistic regret minimization algorithm, namely, AutoRegressive Upper Confidence Bound (AR-UCB), that suffers sublinear regret of order $widetildemathcalO left( frac(k+1)3/2sqrtnT (1-G
arXiv Detail & Related papers (2022-12-12T21:37:36Z) - Parallelizing Contextual Linear Bandits [82.65675585004448]
We present a family of (parallel) contextual linear bandit algorithms, whose regret is nearly identical to their perfectly sequential counterparts.
We also present an empirical evaluation of these parallel algorithms in several domains, including materials discovery and biological sequence design problems.
arXiv Detail & Related papers (2021-05-21T22:22:02Z) - Batched Neural Bandits [107.5072688105936]
BatchNeuralUCB combines neural networks with optimism to address the exploration-exploitation tradeoff.
We prove that BatchNeuralUCB achieves the same regret as the fully sequential version while reducing the number of policy updates considerably.
arXiv Detail & Related papers (2021-02-25T17:36:44Z) - Experimental Design for Regret Minimization in Linear Bandits [19.8309784360219]
We propose a novel design-based algorithm to minimize regret in online linear and bandits.
We provide state-of-the-art finite time regret guarantees and show that our algorithm can be applied in both the bandit and semi-bandit feedback regime.
arXiv Detail & Related papers (2020-11-01T17:59:19Z) - Stage-wise Conservative Linear Bandits [37.717532659194426]
We study bandit optimization, which accounts for (unknown) safety constraints that appear in applications such as online advertising and medical trials.
We present two novel algorithms that respect the baseline constraints and enjoy probabilistic regret bounds of order O(sqrtT log T)
Notably, the proposed algorithms can be adjusted with only minor modifications to tackle different problem variations.
arXiv Detail & Related papers (2020-09-30T19:51:37Z) - Nearly Dimension-Independent Sparse Linear Bandit over Small Action
Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model.
This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine.
We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z) - Sequential Batch Learning in Finite-Action Linear Contextual Bandits [40.01661188919779]
We study the sequential batch learning problem in linear contextual bandits with finite action sets.
This problem provides a finer-grained formulation of many personalized sequential decision making problems in practical applications.
arXiv Detail & Related papers (2020-04-14T06:47:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.