Related papers: Dare not to Ask: Problem-Dependent Guarantees for Budgeted Bandits

Dare not to Ask: Problem-Dependent Guarantees for Budgeted Bandits

URL: http://arxiv.org/abs/2110.05724v1
Date: Tue, 12 Oct 2021 03:24:57 GMT
Title: Dare not to Ask: Problem-Dependent Guarantees for Budgeted Bandits
Authors: Nadav Merlis, Yonathan Efroni, Shie Mannor
Abstract summary: We provide problem-dependent guarantees on both the regret and the asked feedback. We present a new algorithm called BuFALU for which we derive problem-dependent regret and cumulative feedback bounds.
Score: 66.02233330016435
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider a stochastic multi-armed bandit setting where feedback is limited by a (possibly time-dependent) budget, and reward must be actively inquired for it to be observed. Previous works on this setting assumed a strict feedback budget and focused on not violating this constraint while providing problem-independent regret guarantees. In this work, we provide problem-dependent guarantees on both the regret and the asked feedback. In particular, we derive problem-dependent lower bounds on the required feedback and show that there is a fundamental difference between problems with a unique and multiple optimal arms. Furthermore, we present a new algorithm called BuFALU for which we derive problem-dependent regret and cumulative feedback bounds. Notably, we show that BuFALU naturally adapts to the number of optimal arms.

Related papers

Multi-Armed Bandits with Abstention [62.749500564313834]
We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic element: abstention. In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to abstain from accepting the instantaneous reward before observing it.
arXiv Detail & Related papers (2024-02-23T06:27:12Z)
Fixed-Budget Differentially Private Best Arm Identification [62.36929749450298]
We study best arm identification (BAI) in linear bandits in the fixed-budget regime under differential privacy constraints. We derive a minimax lower bound on the error probability, and demonstrate that the lower and the upper bounds decay exponentially in $T$.
arXiv Detail & Related papers (2024-01-17T09:23:25Z)
BanditQ: Fair Bandits with Guaranteed Rewards [10.74025233418392]
Classic no-regret multi-armed bandit algorithms are inherently unfair by design. We propose a new online policy, called BanditQ, that achieves the target reward rates while conceding a regret and target rate violation penalty. The proposed policy is efficient and admits a black-box reduction from the fair prediction problem to the standard adversarial MAB problem.
arXiv Detail & Related papers (2023-04-11T13:39:47Z)
Non-stationary Bandits with Knapsacks [6.2006721306998065]
We study the problem of bandits with knapsacks (BwK) in a non-stationary environment. We employ both non-stationarity measures to derive upper and lower bounds for the problem.
arXiv Detail & Related papers (2022-05-25T01:22:36Z)
Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget [6.759124697337311]
We consider the bandits problem with semi-bandit feedback under finite sampling budget constraints. The action is to choose a set of arms, whereupon feedback for each arm in the chosen set is received. We suggest a generic algorithm suitable to cover the full spectrum of conceivable arm elimination strategies.
arXiv Detail & Related papers (2022-02-09T14:36:05Z)
Confidence-Budget Matching for Sequential Budgeted Learning [69.77435313099366]
We formalize decision-making problems with querying budget. We consider multi-armed bandits, linear bandits, and reinforcement learning problems. We show that CBM based algorithms perform well in the presence of adversity.
arXiv Detail & Related papers (2021-02-05T19:56:31Z)
Stochastic bandits with arm-dependent delays [102.63128271054741]
We propose a simple but efficient UCB-based algorithm called the PatientBandits. We provide both problems-dependent and problems-independent bounds on the regret as well as performance lower bounds.
arXiv Detail & Related papers (2020-06-18T12:13:58Z)
Tight Lower Bounds for Combinatorial Multi-Armed Bandits [72.56064196252498]
The Combinatorial Multi-Armed Bandit problem is a sequential decision-making problem in which an agent selects a set of arms on each round. We show that the recently proposed Gini-weighted smoothness parameter determines the lower bounds for monotone reward functions.
arXiv Detail & Related papers (2020-02-13T08:53:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.