Sequential Batch Learning in Finite-Action Linear Contextual Bandits
- URL: http://arxiv.org/abs/2004.06321v1
- Date: Tue, 14 Apr 2020 06:47:40 GMT
- Title: Sequential Batch Learning in Finite-Action Linear Contextual Bandits
- Authors: Yanjun Han, Zhengqing Zhou, Zhengyuan Zhou, Jose Blanchet, Peter W.
Glynn, Yinyu Ye
- Abstract summary: We study the sequential batch learning problem in linear contextual bandits with finite action sets.
This problem provides a finer-grained formulation of many personalized sequential decision making problems in practical applications.
- Score: 40.01661188919779
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the sequential batch learning problem in linear contextual bandits
with finite action sets, where the decision maker is constrained to split
incoming individuals into (at most) a fixed number of batches and can only
observe outcomes for the individuals within a batch at the batch's end.
Compared to both standard online contextual bandits learning or offline policy
learning in contexutal bandits, this sequential batch learning problem provides
a finer-grained formulation of many personalized sequential decision making
problems in practical applications, including medical treatment in clinical
trials, product recommendation in e-commerce and adaptive experiment design in
crowdsourcing.
We study two settings of the problem: one where the contexts are arbitrarily
generated and the other where the contexts are \textit{iid} drawn from some
distribution. In each setting, we establish a regret lower bound and provide an
algorithm, whose regret upper bound nearly matches the lower bound. As an
important insight revealed therefrom, in the former setting, we show that the
number of batches required to achieve the fully online performance is
polynomial in the time horizon, while for the latter setting, a
pure-exploitation algorithm with a judicious batch partition scheme achieves
the fully online performance even when the number of batches is less than
logarithmic in the time horizon. Together, our results provide a near-complete
characterization of sequential decision making in linear contextual bandits
when batch constraints are present.
Related papers
- Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization [52.80408805368928]
We introduce a novel greedy-style subset selection algorithm for batch acquisition.
Our experiments on the red fluorescent proteins show that our proposed method achieves the baseline performance in 1.69x fewer queries.
arXiv Detail & Related papers (2024-06-21T05:57:08Z) - Batched Nonparametric Contextual Bandits [21.031965676746776]
We study nonparametric contextual bandits under batch constraints.
We propose a novel batch learning algorithm that achieves the optimal regret.
Our theoretical results suggest that for nonparametric contextual bandits, a nearly constant number of policy updates can attain optimal regret.
arXiv Detail & Related papers (2024-02-27T18:06:20Z) - Continual Learning in Linear Classification on Separable Data [34.78569443156924]
We show that learning with weak regularization reduces to solving a sequential max-margin problem.
We then develop upper bounds on the forgetting and other quantities of interest under various settings.
We discuss several practical implications to popular training practices like regularization scheduling and weighting.
arXiv Detail & Related papers (2023-06-06T09:34:11Z) - Smoothed Online Learning is as Easy as Statistical Learning [77.00766067963195]
We provide the first oracle-efficient, no-regret algorithms in this setting.
We show that if a function class is learnable in the classical setting, then there is an oracle-efficient, no-regret algorithm for contextual bandits.
arXiv Detail & Related papers (2022-02-09T19:22:34Z) - Parallelizing Contextual Linear Bandits [82.65675585004448]
We present a family of (parallel) contextual linear bandit algorithms, whose regret is nearly identical to their perfectly sequential counterparts.
We also present an empirical evaluation of these parallel algorithms in several domains, including materials discovery and biological sequence design problems.
arXiv Detail & Related papers (2021-05-21T22:22:02Z) - Batched Neural Bandits [107.5072688105936]
BatchNeuralUCB combines neural networks with optimism to address the exploration-exploitation tradeoff.
We prove that BatchNeuralUCB achieves the same regret as the fully sequential version while reducing the number of policy updates considerably.
arXiv Detail & Related papers (2021-02-25T17:36:44Z) - Dynamic Batch Learning in High-Dimensional Sparse Linear Contextual
Bandits [18.64677502651614]
We study the problem of dynamic batch learning in high-dimensional sparse linear contextual bandits.
Our work provides the first inroad into a theoretical understanding of dynamic batch learning in high-dimensional sparse linear contextual bandits.
arXiv Detail & Related papers (2020-08-27T05:34:34Z) - Improved Algorithms for Conservative Exploration in Bandits [113.55554483194832]
We study the conservative learning problem in the contextual linear bandit setting and introduce a novel algorithm, the Conservative Constrained LinUCB (CLUCB2)
We derive regret bounds for CLUCB2 that match existing results and empirically show that it outperforms state-of-the-art conservative bandit algorithms in a number of synthetic and real-world problems.
arXiv Detail & Related papers (2020-02-08T19:35:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.