Survey Bandits with Regret Guarantees
- URL: http://arxiv.org/abs/2002.09814v1
- Date: Sun, 23 Feb 2020 03:24:03 GMT
- Title: Survey Bandits with Regret Guarantees
- Authors: Sanath Kumar Krishnamurthy, Susan Athey
- Abstract summary: We consider a variant of the contextual bandit problem.
In standard contextual bandits, when a user arrives we get the user's complete feature vector and then assign a treatment (arm) to that user.
We propose algorithms that avoid needless feature collection while maintaining strong regret guarantees.
- Score: 15.247024659017542
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a variant of the contextual bandit problem. In standard
contextual bandits, when a user arrives we get the user's complete feature
vector and then assign a treatment (arm) to that user. In a number of
applications (like healthcare), collecting features from users can be costly.
To address this issue, we propose algorithms that avoid needless feature
collection while maintaining strong regret guarantees.
Related papers
- Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - Leveraging User-Triggered Supervision in Contextual Bandits [34.58466163463977]
We study contextual bandit (CB) problems, where the user can sometimes respond with the best action in a given context.
We develop a new framework to leverage such signals, while being robust to their biased nature.
arXiv Detail & Related papers (2023-02-07T22:42:27Z) - Contexts can be Cheap: Solving Stochastic Contextual Bandits with Linear
Bandit Algorithms [39.70492757288025]
We address the contextual linear bandit problem, where a decision maker is provided a context.
We show that the contextual problem can be solved as a linear bandit problem.
Our results imply a $O(dsqrtTlog T)$ high-probability regret bound for contextual linear bandits.
arXiv Detail & Related papers (2022-11-08T22:18:53Z) - Contextual Bandits with Smooth Regret: Efficient Learning in Continuous
Action Spaces [14.366265951396587]
We design efficient general-purpose contextual bandit algorithms for large -- or even continuous -- action spaces.
We propose a smooth regret notion for contextual bandits, which dominates previously proposed alternatives.
Our algorithms can be used to recover the previous minimax/Pareto optimal guarantees under the standard regret.
arXiv Detail & Related papers (2022-07-12T21:27:09Z) - Incentivizing Combinatorial Bandit Exploration [87.08827496301839]
Consider a bandit algorithm that recommends actions to self-interested users in a recommendation system.
Users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations.
While the users prefer to exploit, the algorithm can incentivize them to explore by leveraging the information collected from the previous users.
arXiv Detail & Related papers (2022-06-01T13:46:25Z) - Dare not to Ask: Problem-Dependent Guarantees for Budgeted Bandits [66.02233330016435]
We provide problem-dependent guarantees on both the regret and the asked feedback.
We present a new algorithm called BuFALU for which we derive problem-dependent regret and cumulative feedback bounds.
arXiv Detail & Related papers (2021-10-12T03:24:57Z) - Confidence-Budget Matching for Sequential Budgeted Learning [69.77435313099366]
We formalize decision-making problems with querying budget.
We consider multi-armed bandits, linear bandits, and reinforcement learning problems.
We show that CBM based algorithms perform well in the presence of adversity.
arXiv Detail & Related papers (2021-02-05T19:56:31Z) - Contextual Blocking Bandits [35.235375147227124]
We study a novel variant of the multi-armed bandit problem, where at each time step, the player observes an independently sampled context that determines the arms' mean rewards.
Playing an arm blocks it (across all contexts) for a fixed and known number of future time steps.
We propose a UCB-based variant of the full-information algorithm that guarantees a $mathcalO(log T)$-regret w.r.t. an $alpha$regret strategy in $T time steps, matching the $Omega(log(T)$ lower bound
arXiv Detail & Related papers (2020-03-06T20:34:42Z) - Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean.
We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z) - Tight Lower Bounds for Combinatorial Multi-Armed Bandits [72.56064196252498]
The Combinatorial Multi-Armed Bandit problem is a sequential decision-making problem in which an agent selects a set of arms on each round.
We show that the recently proposed Gini-weighted smoothness parameter determines the lower bounds for monotone reward functions.
arXiv Detail & Related papers (2020-02-13T08:53:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.