Related papers: The Impact of Batch Learning in Stochastic Linear Bandits

The Impact of Batch Learning in Stochastic Linear Bandits

URL: http://arxiv.org/abs/2202.06657v1
Date: Mon, 14 Feb 2022 12:27:06 GMT
Title: The Impact of Batch Learning in Stochastic Linear Bandits
Authors: Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein
Abstract summary: We consider a special case of bandit problems, named batched bandits, in which an agent observes batches of responses over a certain time period. Our main theoretical results show that the impact of batch learning can be measured proportional to the regret of online behavior.
Score: 7.3449418475577595
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider a special case of bandit problems, named batched bandits, in which an agent observes batches of responses over a certain time period. Unlike previous work, we consider a practically relevant batch-centric scenario of batch learning. That is to say, we provide a policy-agnostic regret analysis and demonstrate upper and lower bounds for the regret of a candidate policy. Our main theoretical results show that the impact of batch learning can be measured proportional to the regret of online behavior. Primarily, we study two settings of the problem: instance-independent and instance-dependent. While the upper bound is the same for both settings, the worst-case lower bound is more comprehensive in the former case and more accurate in the latter one. Also, we provide a more robust result for the 2-armed bandit problem as an important insight. Finally, we demonstrate the consistency of theoretical results by conducting empirical experiments and reflect on the optimal batch size choice.

Related papers

Bidirectional Decoding: Improving Action Chunking via Guided Test-Time Sampling [51.38330727868982]
We show how action chunking impacts the divergence between a learner and a demonstrator. We propose Bidirectional Decoding (BID), a test-time inference algorithm that bridges action chunking with closed-loop adaptation. Our method boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z)
Causal Contextual Bandits with Adaptive Context [12.205797997133397]
We study a variant of causal contextual bandits where the context is chosen based on an initial intervention chosen by the learner. We prove that our simple regret is essentially tight for a large class of instances.
arXiv Detail & Related papers (2024-05-28T22:17:57Z)
Thompson Sampling in Partially Observable Contextual Bandits [2.465689259704613]
We study bandit policies for learning to select optimal arms based on the data of observations. Our theoretical analysis shows that the Thompson sampling policy successfully balances exploration and exploitation. These techniques pave the road towards studying other decision-making problems with contextual information as well as partial observations.
arXiv Detail & Related papers (2024-02-15T19:37:39Z)
On Penalization in Stochastic Multi-armed Bandits [22.04356596828437]
We study an important variant of the multi-armed bandit (MAB) problem, which takes penalization into consideration. We propose a hard-threshold UCB-like algorithm, which enjoys many merits including fairness, nearly optimal regret, better tradeoff between reward and fairness.
arXiv Detail & Related papers (2022-11-15T17:13:09Z)
Batched Dueling Bandits [13.69077222007053]
We study the batched $K$-armed dueling bandit problem under two standard settings. We obtain algorithms with a smooth trade-off between the number of batches and regret.
arXiv Detail & Related papers (2022-02-22T04:02:36Z)
The Impact of Batch Learning in Stochastic Bandits [5.008064542274928]
We consider a special case of bandit problems, namely batched bandits. Motivated by natural restrictions of recommender systems and e-commerce platforms, we assume that a learning agent observes responses batched in groups over a certain time period. We provide a policy-agnostic regret analysis and demonstrate upper and lower bounds for the regret of a candidate policy.
arXiv Detail & Related papers (2021-11-03T08:38:10Z)
Adversarial Robustness with Semi-Infinite Constrained Learning [177.42714838799924]
Deep learning to inputs perturbations has raised serious questions about its use in safety-critical domains. We propose a hybrid Langevin Monte Carlo training approach to mitigate this issue. We show that our approach can mitigate the trade-off between state-of-the-art performance and robust robustness.
arXiv Detail & Related papers (2021-10-29T13:30:42Z)
Batched Neural Bandits [107.5072688105936]
BatchNeuralUCB combines neural networks with optimism to address the exploration-exploitation tradeoff. We prove that BatchNeuralUCB achieves the same regret as the fully sequential version while reducing the number of policy updates considerably.
arXiv Detail & Related papers (2021-02-25T17:36:44Z)
Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective [104.67295710363679]
In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm. We introduce a family of complexity measures that are both sufficient and necessary to obtain instance-dependent regret bounds. We then introduce new oracle-efficient algorithms which adapt to the gap whenever possible, while also attaining the minimax rate in the worst case.
arXiv Detail & Related papers (2020-10-07T01:33:06Z)
Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders [62.54431888432302]
We study an OPE problem in an infinite-horizon, ergodic Markov decision process with unobserved confounders. We show how, given only a latent variable model for states and actions, policy value can be identified from off-policy data.
arXiv Detail & Related papers (2020-07-27T22:19:01Z)
Provably Good Batch Reinforcement Learning Without Great Exploration [51.51462608429621]
Batch reinforcement learning (RL) is important to apply RL algorithms to many high stakes tasks. Recent algorithms have shown promise but can still be overly optimistic in their expected outcomes. We show that a small modification to Bellman optimality and evaluation back-up to take a more conservative update can have much stronger guarantees.
arXiv Detail & Related papers (2020-07-16T09:25:54Z)
Predictive Bandits [68.8204255655161]
We introduce and study a new class of bandit problems, referred to as predictive bandits. In each round, the decision maker first decides whether to gather information about the rewards of particular arms. The decision maker then selects an arm to be actually played in the round.
arXiv Detail & Related papers (2020-04-02T17:12:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.