Bandits with Knapsacks beyond the Worst-Case
- URL: http://arxiv.org/abs/2002.00253v7
- Date: Tue, 28 Dec 2021 17:55:19 GMT
- Title: Bandits with Knapsacks beyond the Worst-Case
- Authors: Karthik Abinav Sankararaman and Aleksandrs Slivkins
- Abstract summary: We present three results that go beyond the worst-case perspective.
First, we provide upper and lower bounds which amount to a full characterization for logarithmic, instance-dependent regret rates.
Second, we consider "simple regret" in BwK, which tracks algorithm's performance in a given round, and prove that it is small in all but a few rounds.
- Score: 87.54497614804409
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bandits with Knapsacks (BwK) is a general model for multi-armed bandits under
supply/budget constraints. While worst-case regret bounds for BwK are
well-understood, we present three results that go beyond the worst-case
perspective. First, we provide upper and lower bounds which amount to a full
characterization for logarithmic, instance-dependent regret rates. Second, we
consider "simple regret" in BwK, which tracks algorithm's performance in a
given round, and prove that it is small in all but a few rounds. Third, we
provide a general "reduction" from BwK to bandits which takes advantage of some
known helpful structure, and apply this reduction to combinatorial
semi-bandits, linear contextual bandits, and multinomial-logit bandits. Our
results build on the BwK algorithm from \citet{AgrawalDevanur-ec14}, providing
new analyses thereof.
Related papers
- Feel-Good Thompson Sampling for Contextual Dueling Bandits [49.450050682705026]
We propose a Thompson sampling algorithm, named FGTS.CDB, for linear contextual dueling bandits.
At the core of our algorithm is a new Feel-Good exploration term specifically tailored for dueling bandits.
Our algorithm achieves nearly minimax-optimal regret, i.e., $tildemathcalO(dsqrt T)$, where $d$ is the model dimension and $T$ is the time horizon.
arXiv Detail & Related papers (2024-04-09T04:45:18Z) - Adversarial Combinatorial Bandits with Switching Costs [55.2480439325792]
We study the problem of adversarial bandit with a switching cost $lambda$ for a switch of each selected arm in each round.
We derive lower bounds for the minimax regret and design algorithms to approach them.
arXiv Detail & Related papers (2024-04-02T12:15:37Z) - Zero-Inflated Bandits [11.60342504007264]
We study zero-inflated bandits, where the reward is modeled as a classic semi-parametric distribution called zero-inflated distribution.
We derive the regret bounds under both multi-armed bandits with general reward assumptions and contextual generalized linear bandit with sub-Gaussian rewards.
In many settings, the regret rates of our algorithms are either minimax optimal or state-of-the-art.
arXiv Detail & Related papers (2023-12-25T03:13:21Z) - Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression [65.8785736964253]
We consider contextual bandits with linear constraints (CBwLC), a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption.
This problem generalizes contextual bandits with knapsacks (CBwK), allowing for packing and covering constraints, as well as positive and negative resource consumption.
We provide the first algorithm for CBwLC (or CBwK) that is based on regression oracles. The algorithm is simple, computationally efficient, and statistically optimal under mild assumptions.
arXiv Detail & Related papers (2022-11-14T16:08:44Z) - Bandits Corrupted by Nature: Lower Bounds on Regret and Robust
Optimistic Algorithm [14.214707836697823]
We study the corrupted bandit problem, i.e. a multi-armed bandit problem with $k$ unknown reward distributions.
We propose a novel UCB-type algorithm for corrupted bandits, namely HubUCB, that builds on Huber's estimator for robust mean estimation.
We experimentally illustrate the efficiency of HubUCB and SeqHubUCB in solving corrupted bandits for different reward distributions and different levels of corruptions.
arXiv Detail & Related papers (2022-03-07T07:44:05Z) - Risk-Aware Algorithms for Combinatorial Semi-Bandits [7.716156977428555]
We study the multi-armed bandit problem under semi-bandit feedback.
We consider the problem of maximizing the Conditional Value-at-Risk (CVaR), a risk measure that takes into account only the worst-case rewards.
We propose new algorithms that maximize the CVaR of the rewards obtained from the super arms of the bandit.
arXiv Detail & Related papers (2021-12-02T11:29:43Z) - The Symmetry between Bandits and Knapsacks: A Primal-Dual LP-based
Approach [15.626602292752624]
We develop a primal-dual based algorithm that achieves a problem-dependent logarithmic regret bound.
The sub-optimality measure highlights the important role of knapsacks in determining regret.
This is the first problem-dependent logarithmic regret bound for solving the general BwK problem.
arXiv Detail & Related papers (2021-02-12T08:14:30Z) - Upper Confidence Bounds for Combining Stochastic Bandits [52.10197476419621]
We provide a simple method to combine bandit algorithms.
Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem.
arXiv Detail & Related papers (2020-12-24T05:36:29Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.