Dare not to Ask: Problem-Dependent Guarantees for Budgeted Bandits
- URL: http://arxiv.org/abs/2110.05724v1
- Date: Tue, 12 Oct 2021 03:24:57 GMT
- Title: Dare not to Ask: Problem-Dependent Guarantees for Budgeted Bandits
- Authors: Nadav Merlis, Yonathan Efroni, Shie Mannor
- Abstract summary: We provide problem-dependent guarantees on both the regret and the asked feedback.
We present a new algorithm called BuFALU for which we derive problem-dependent regret and cumulative feedback bounds.
- Score: 66.02233330016435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a stochastic multi-armed bandit setting where feedback is limited
by a (possibly time-dependent) budget, and reward must be actively inquired for
it to be observed. Previous works on this setting assumed a strict feedback
budget and focused on not violating this constraint while providing
problem-independent regret guarantees. In this work, we provide
problem-dependent guarantees on both the regret and the asked feedback. In
particular, we derive problem-dependent lower bounds on the required feedback
and show that there is a fundamental difference between problems with a unique
and multiple optimal arms. Furthermore, we present a new algorithm called
BuFALU for which we derive problem-dependent regret and cumulative feedback
bounds. Notably, we show that BuFALU naturally adapts to the number of optimal
arms.
Related papers
- Multi-Armed Bandits with Abstention [62.749500564313834]
We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic element: abstention.
In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to abstain from accepting the instantaneous reward before observing it.
arXiv Detail & Related papers (2024-02-23T06:27:12Z) - Fixed-Budget Differentially Private Best Arm Identification [62.36929749450298]
We study best arm identification (BAI) in linear bandits in the fixed-budget regime under differential privacy constraints.
We derive a minimax lower bound on the error probability, and demonstrate that the lower and the upper bounds decay exponentially in $T$.
arXiv Detail & Related papers (2024-01-17T09:23:25Z) - BanditQ: Fair Bandits with Guaranteed Rewards [10.74025233418392]
Classic no-regret multi-armed bandit algorithms are inherently unfair by design.
We propose a new online policy, called BanditQ, that achieves the target reward rates while conceding a regret and target rate violation penalty.
The proposed policy is efficient and admits a black-box reduction from the fair prediction problem to the standard adversarial MAB problem.
arXiv Detail & Related papers (2023-04-11T13:39:47Z) - Non-stationary Bandits with Knapsacks [6.2006721306998065]
We study the problem of bandits with knapsacks (BwK) in a non-stationary environment.
We employ both non-stationarity measures to derive upper and lower bounds for the problem.
arXiv Detail & Related papers (2022-05-25T01:22:36Z) - Finding Optimal Arms in Non-stochastic Combinatorial Bandits with
Semi-bandit Feedback and Finite Budget [6.759124697337311]
We consider the bandits problem with semi-bandit feedback under finite sampling budget constraints.
The action is to choose a set of arms, whereupon feedback for each arm in the chosen set is received.
We suggest a generic algorithm suitable to cover the full spectrum of conceivable arm elimination strategies.
arXiv Detail & Related papers (2022-02-09T14:36:05Z) - Confidence-Budget Matching for Sequential Budgeted Learning [69.77435313099366]
We formalize decision-making problems with querying budget.
We consider multi-armed bandits, linear bandits, and reinforcement learning problems.
We show that CBM based algorithms perform well in the presence of adversity.
arXiv Detail & Related papers (2021-02-05T19:56:31Z) - Stochastic bandits with arm-dependent delays [102.63128271054741]
We propose a simple but efficient UCB-based algorithm called the PatientBandits.
We provide both problems-dependent and problems-independent bounds on the regret as well as performance lower bounds.
arXiv Detail & Related papers (2020-06-18T12:13:58Z) - Tight Lower Bounds for Combinatorial Multi-Armed Bandits [72.56064196252498]
The Combinatorial Multi-Armed Bandit problem is a sequential decision-making problem in which an agent selects a set of arms on each round.
We show that the recently proposed Gini-weighted smoothness parameter determines the lower bounds for monotone reward functions.
arXiv Detail & Related papers (2020-02-13T08:53:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.