Related papers: Predictive Bandits

Predictive Bandits

URL: http://arxiv.org/abs/2004.01141v1
Date: Thu, 2 Apr 2020 17:12:33 GMT
Title: Predictive Bandits
Authors: Simon Lindst{\aa}hl, Alexandre Proutiere, Andreas Johnsson
Abstract summary: We introduce and study a new class of bandit problems, referred to as predictive bandits. In each round, the decision maker first decides whether to gather information about the rewards of particular arms. The decision maker then selects an arm to be actually played in the round.
Score: 68.8204255655161
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce and study a new class of stochastic bandit problems, referred to as predictive bandits. In each round, the decision maker first decides whether to gather information about the rewards of particular arms (so that their rewards in this round can be predicted). These measurements are costly, and may be corrupted by noise. The decision maker then selects an arm to be actually played in the round. Predictive bandits find applications in many areas; e.g. they can be applied to channel selection problems in radio communication systems. In this paper, we provide the first theoretical results about predictive bandits, and focus on scenarios where the decision maker is allowed to measure at most one arm per round. We derive asymptotic instance-specific regret lower bounds for these problems, and develop algorithms whose regret match these fundamental limits. We illustrate the performance of our algorithms through numerical experiments. In particular, we highlight the gains that can be achieved by using reward predictions, and investigate the impact of the noise in the corresponding measurements.

Related papers

Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms. We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z)
Multi-Armed Bandits with Abstention [62.749500564313834]
We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic element: abstention. In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to abstain from accepting the instantaneous reward before observing it.
arXiv Detail & Related papers (2024-02-23T06:27:12Z)
Pure Exploration of Causal Bandits [9.77519365079468]
Causal bandit problem integrates causal inference with multi-armed bandits. Online learning task: given a causal graph with unknown causal inference distributions, we can choose to either intervene one variable or do no intervention. We provide first gap-dependent fully adaptive pure exploration algorithms on three types of causal models.
arXiv Detail & Related papers (2022-06-16T02:19:37Z)
A PDE-Based Analysis of the Symmetric Two-Armed Bernoulli Bandit [1.2183405753834562]
This work addresses a version of the two-armed Bernoulli bandit problem where the sum of the means of the arms is one. We obtain the leading order terms of the minmax optimal regret and pseudoregret for this problem by associating each of them with a solution of a linear heat equation.
arXiv Detail & Related papers (2022-02-11T17:03:18Z)
Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs [27.636407641546914]
We propose a novel robust statistical estimator, mean of medians, which estimates a random variable by computing the empirical mean of a sequence of empirical medians. We show that the regret bound is near-optimal even with very heavy-tailed noise.
arXiv Detail & Related papers (2021-10-26T17:30:44Z)
Confidence-Budget Matching for Sequential Budgeted Learning [69.77435313099366]
We formalize decision-making problems with querying budget. We consider multi-armed bandits, linear bandits, and reinforcement learning problems. We show that CBM based algorithms perform well in the presence of adversity.
arXiv Detail & Related papers (2021-02-05T19:56:31Z)
Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback [32.62857394584907]
We study the multi-armed bandit (MAB) problem with composite and anonymous feedback. We propose adaptive algorithms for both the adversarial and non- adversarial cases.
arXiv Detail & Related papers (2020-12-13T12:25:41Z)
Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean. We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)
The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity [83.81297078039836]
We consider incentivized exploration: a version of multi-armed bandits where the choice of arms is controlled by self-interested agents. We focus on the price of incentives: the loss in performance, broadly construed, incurred for the sake of incentive-compatibility.
arXiv Detail & Related papers (2020-02-03T04:58:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.