Bandits with Replenishable Knapsacks: the Best of both Worlds
- URL: http://arxiv.org/abs/2306.08470v1
- Date: Wed, 14 Jun 2023 12:34:00 GMT
- Title: Bandits with Replenishable Knapsacks: the Best of both Worlds
- Authors: Martino Bernasconi, Matteo Castiglioni, Andrea Celli, Federico Fusco
- Abstract summary: We study a natural generalization of the BwK framework which allows non-monotonic resource utilization.
Under an input model, our algorithm yields an instance-independent $tildeO(T1/2)$ regret bound.
- Score: 26.786438972889208
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The bandits with knapsack (BwK) framework models online decision-making
problems in which an agent makes a sequence of decisions subject to resource
consumption constraints. The traditional model assumes that each action
consumes a non-negative amount of resources and the process ends when the
initial budgets are fully depleted. We study a natural generalization of the
BwK framework which allows non-monotonic resource utilization, i.e., resources
can be replenished by a positive amount. We propose a best-of-both-worlds
primal-dual template that can handle any online learning problem with
replenishment for which a suitable primal regret minimizer exists. In
particular, we provide the first positive results for the case of adversarial
inputs by showing that our framework guarantees a constant competitive ratio
$\alpha$ when $B=\Omega(T)$ or when the possible per-round replenishment is a
positive constant. Moreover, under a stochastic input model, our algorithm
yields an instance-independent $\tilde{O}(T^{1/2})$ regret bound which
complements existing instance-dependent bounds for the same setting. Finally,
we provide applications of our framework to some economic problems of practical
relevance.
Related papers
- Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Online Learning under Budget and ROI Constraints via Weak Adaptivity [57.097119428915796]
Existing primal-dual algorithms for constrained online learning problems rely on two fundamental assumptions.
We show how such assumptions can be circumvented by endowing standard primal-dual templates with weakly adaptive regret minimizers.
We prove the first best-of-both-worlds no-regret guarantees which hold in absence of the two aforementioned assumptions.
arXiv Detail & Related papers (2023-02-02T16:30:33Z) - Stochastic Rising Bandits [40.32303434592863]
We study a particular case of the rested and restless bandits in which the arms' expected payoff is monotonically non-decreasing.
This characteristic allows designing specifically crafted algorithms that exploit the regularity of the payoffs to provide tight regret bounds.
We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset.
arXiv Detail & Related papers (2022-12-07T17:30:45Z) - Constrained Pure Exploration Multi-Armed Bandits with a Fixed Budget [4.226118870861363]
We consider a constrained, pure exploration, multi-armed bandit formulation under a fixed budget.
We propose an algorithm called textscConstrained-SR based on the Successive Rejects framework.
We show that the associated decay rate is nearly optimal relative to an information theoretic lower bound in certain special cases.
arXiv Detail & Related papers (2022-11-27T08:58:16Z) - Non-monotonic Resource Utilization in the Bandits with Knapsacks Problem [6.227074051959886]
Bandits with knapsacks (BwK) is an influential model of sequential decision-making under uncertainty.
We introduce a natural generalization of the BwK problem that allows non-monotonic resource utilization.
arXiv Detail & Related papers (2022-09-24T14:02:05Z) - A Unifying Framework for Online Optimization with Long-Term Constraints [62.35194099438855]
We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints.
The goal is to maximize their total reward, while at the same time achieving small cumulative violation across the $T$ rounds.
We present the first best-of-both-world type algorithm for this general class problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown model, and in the case in which they are selected at each round by an adversary.
arXiv Detail & Related papers (2022-09-15T16:59:19Z) - Non-stationary Bandits with Knapsacks [6.2006721306998065]
We study the problem of bandits with knapsacks (BwK) in a non-stationary environment.
We employ both non-stationarity measures to derive upper and lower bounds for the problem.
arXiv Detail & Related papers (2022-05-25T01:22:36Z) - Online Learning with Knapsacks: the Best of Both Worlds [54.28273783164608]
We casting online learning problems in which a decision maker wants to maximize their expected reward without violating a finite set of $m$m resource constraints.
Our framework allows the decision maker to handle its evidence flexibility and costoretic functions.
arXiv Detail & Related papers (2022-02-28T12:10:48Z) - Universal and data-adaptive algorithms for model selection in linear
contextual bandits [52.47796554359261]
We consider the simplest non-trivial instance of model-selection: distinguishing a simple multi-armed bandit problem from a linear contextual bandit problem.
We introduce new algorithms that explore in a data-adaptive manner and provide guarantees of the form $mathcalO(dalpha T1- alpha)$.
Our approach extends to model selection among nested linear contextual bandits under some additional assumptions.
arXiv Detail & Related papers (2021-11-08T18:05:35Z) - Nearly Dimension-Independent Sparse Linear Bandit over Small Action
Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model.
This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine.
We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.