The Impact of Batch Learning in Stochastic Bandits
- URL: http://arxiv.org/abs/2111.02071v1
- Date: Wed, 3 Nov 2021 08:38:10 GMT
- Title: The Impact of Batch Learning in Stochastic Bandits
- Authors: Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, and Maurits Kaptein
- Abstract summary: We consider a special case of bandit problems, namely batched bandits.
Motivated by natural restrictions of recommender systems and e-commerce platforms, we assume that a learning agent observes responses batched in groups over a certain time period.
We provide a policy-agnostic regret analysis and demonstrate upper and lower bounds for the regret of a candidate policy.
- Score: 5.008064542274928
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a special case of bandit problems, namely batched bandits.
Motivated by natural restrictions of recommender systems and e-commerce
platforms, we assume that a learning agent observes responses batched in groups
over a certain time period. Unlike previous work, we consider a more
practically relevant batch-centric scenario of batch learning. We provide a
policy-agnostic regret analysis and demonstrate upper and lower bounds for the
regret of a candidate policy. Our main theoretical results show that the impact
of batch learning can be measured in terms of online behavior. Finally, we
demonstrate the consistency of theoretical results by conducting empirical
experiments and reflect on the optimal batch size choice.
Related papers
- On the Value of Stochastic Side Information in Online Learning [3.4788711710826083]
We study the effectiveness of side information in deterministic online learning scenarios.
We assume that certain side information is available to the forecaster but not the experts.
arXiv Detail & Related papers (2023-03-09T15:06:07Z) - Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem.
We examine the performance of various debiasing methods across multiple tasks.
We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z) - The Impact of Batch Learning in Stochastic Linear Bandits [7.3449418475577595]
We consider a special case of bandit problems, named batched bandits, in which an agent observes batches of responses over a certain time period.
Our main theoretical results show that the impact of batch learning can be measured proportional to the regret of online behavior.
arXiv Detail & Related papers (2022-02-14T12:27:06Z) - Achieving Minimax Rates in Pool-Based Batch Active Learning [26.12124106759262]
We consider a batch active learning scenario where the learner adaptively issues batches of points to a labeling oracle.
In this paper we propose a solution which requires a careful trade off between the informativeness of the queried points and their diversity.
arXiv Detail & Related papers (2022-02-11T04:55:45Z) - An Investigation of Replay-based Approaches for Continual Learning [79.0660895390689]
Continual learning (CL) is a major challenge of machine learning (ML) and describes the ability to learn several tasks sequentially without catastrophic forgetting (CF)
Several solution classes have been proposed, of which so-called replay-based approaches seem very promising due to their simplicity and robustness.
We empirically investigate replay-based approaches of continual learning and assess their potential for applications.
arXiv Detail & Related papers (2021-08-15T15:05:02Z) - Learning from an Exploring Demonstrator: Optimal Reward Estimation for
Bandits [36.37578212532926]
We introduce the "inverse bandit" problem of estimating the rewards of a multi-armed bandit instance.
Existing approaches to the related problem of inverse reinforcement learning assume the execution of an optimal policy.
We develop simple and efficient reward estimation procedures for demonstrations within a class of upper-confidence-based algorithms.
arXiv Detail & Related papers (2021-06-28T17:37:49Z) - Investigating the Role of Negatives in Contrastive Representation
Learning [59.30700308648194]
Noise contrastive learning is a popular technique for unsupervised representation learning.
We focus on disambiguating the role of one of these parameters: the number of negative examples.
We find that the results broadly agree with our theory, while our vision experiments are murkier with performance sometimes even being insensitive to the number of negatives.
arXiv Detail & Related papers (2021-06-18T06:44:16Z) - Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with
Latent Confounders [62.54431888432302]
We study an OPE problem in an infinite-horizon, ergodic Markov decision process with unobserved confounders.
We show how, given only a latent variable model for states and actions, policy value can be identified from off-policy data.
arXiv Detail & Related papers (2020-07-27T22:19:01Z) - Provably Good Batch Reinforcement Learning Without Great Exploration [51.51462608429621]
Batch reinforcement learning (RL) is important to apply RL algorithms to many high stakes tasks.
Recent algorithms have shown promise but can still be overly optimistic in their expected outcomes.
We show that a small modification to Bellman optimality and evaluation back-up to take a more conservative update can have much stronger guarantees.
arXiv Detail & Related papers (2020-07-16T09:25:54Z) - Learning "What-if" Explanations for Sequential Decision-Making [92.8311073739295]
Building interpretable parameterizations of real-world decision-making on the basis of demonstrated behavior is essential.
We propose learning explanations of expert decisions by modeling their reward function in terms of preferences with respect to "what if" outcomes.
We highlight the effectiveness of our batch, counterfactual inverse reinforcement learning approach in recovering accurate and interpretable descriptions of behavior.
arXiv Detail & Related papers (2020-07-02T14:24:17Z) - Learning by Repetition: Stochastic Multi-armed Bandits under Priming
Effect [2.5966580648312223]
We study the effect of persistence of engagement on learning in a multi-armed bandit setting.
We provide novel algorithms that achieves sublinear regret in time and the relevant wear-in/wear-out parameters.
arXiv Detail & Related papers (2020-06-18T08:27:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.