Batched Neural Bandits
- URL: http://arxiv.org/abs/2102.13028v1
- Date: Thu, 25 Feb 2021 17:36:44 GMT
- Title: Batched Neural Bandits
- Authors: Quanquan Gu and Amin Karbasi and Khashayar Khosravi and Vahab Mirrokni
and Dongruo Zhou
- Abstract summary: BatchNeuralUCB combines neural networks with optimism to address the exploration-exploitation tradeoff.
We prove that BatchNeuralUCB achieves the same regret as the fully sequential version while reducing the number of policy updates considerably.
- Score: 107.5072688105936
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many sequential decision-making problems, the individuals are split into
several batches and the decision-maker is only allowed to change her policy at
the end of batches. These batch problems have a large number of applications,
ranging from clinical trials to crowdsourcing. Motivated by this, we study the
stochastic contextual bandit problem for general reward distributions under the
batched setting. We propose the BatchNeuralUCB algorithm which combines neural
networks with optimism to address the exploration-exploitation tradeoff while
keeping the total number of batches limited. We study BatchNeuralUCB under both
fixed and adaptive batch size settings and prove that it achieves the same
regret as the fully sequential version while reducing the number of policy
updates considerably. We confirm our theoretical results via simulations on
both synthetic and real-world datasets.
Related papers
- Batched Nonparametric Contextual Bandits [21.031965676746776]
We study nonparametric contextual bandits under batch constraints.
We propose a novel batch learning algorithm that achieves the optimal regret.
Our theoretical results suggest that for nonparametric contextual bandits, a nearly constant number of policy updates can attain optimal regret.
arXiv Detail & Related papers (2024-02-27T18:06:20Z) - Batch Bayesian Optimization for Replicable Experimental Design [56.64902148159355]
Many real-world design problems evaluate multiple experimental conditions in parallel and replicate each condition multiple times due to large and heteroscedastic observation noise.
We propose the Batch Thompson Sampling for Replicable Experimental Design framework, which encompasses three algorithms.
We show the effectiveness of our algorithms in two practical real-world applications: precision agriculture and AutoML.
arXiv Detail & Related papers (2023-11-02T12:46:03Z) - Multi-Armed Bandits with Generalized Temporally-Partitioned Rewards [0.4194295877935867]
In some real-world applications, feedback about a decision is delayed and may arrive via partial rewards that are observed with different delays.
We propose a novel problem formulation called multi-armed bandits with generalized temporally-partitioned rewards.
We derive a lower bound on the performance of any uniformly efficient algorithm for the considered problem.
arXiv Detail & Related papers (2023-03-01T16:22:22Z) - Sample-Then-Optimize Batch Neural Thompson Sampling [50.800944138278474]
We introduce two algorithms for black-box optimization based on the Thompson sampling (TS) policy.
To choose an input query, we only need to train an NN and then choose the query by maximizing the trained NN.
Our algorithms sidestep the need to invert the large parameter matrix yet still preserve the validity of the TS policy.
arXiv Detail & Related papers (2022-10-13T09:01:58Z) - Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When
Partial Feedback Counts [53.579515853222986]
We study a novel bandit setting, namely Multi-Armed Bandit with Temporally-Partitioned Rewards (TP-MAB)
This setting is a natural extension of delayed-feedback bandits to the case in which rewards may be dilated over a finite-time span after the pull.
We provide two algorithms to address TP-MAB problems, namely, TP-UCB-FR and TP-UCB-EW.
arXiv Detail & Related papers (2022-06-01T15:56:59Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - Adaptive Combinatorial Allocation [77.86290991564829]
We consider settings where an allocation has to be chosen repeatedly, returns are unknown but can be learned, and decisions are subject to constraints.
Our model covers two-sided and one-sided matching, even with complex constraints.
arXiv Detail & Related papers (2020-11-04T15:02:59Z) - Dynamic Batch Learning in High-Dimensional Sparse Linear Contextual
Bandits [18.64677502651614]
We study the problem of dynamic batch learning in high-dimensional sparse linear contextual bandits.
Our work provides the first inroad into a theoretical understanding of dynamic batch learning in high-dimensional sparse linear contextual bandits.
arXiv Detail & Related papers (2020-08-27T05:34:34Z) - Sequential Batch Learning in Finite-Action Linear Contextual Bandits [40.01661188919779]
We study the sequential batch learning problem in linear contextual bandits with finite action sets.
This problem provides a finer-grained formulation of many personalized sequential decision making problems in practical applications.
arXiv Detail & Related papers (2020-04-14T06:47:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.