Non-stationary Bandits with Knapsacks
- URL: http://arxiv.org/abs/2205.12427v1
- Date: Wed, 25 May 2022 01:22:36 GMT
- Title: Non-stationary Bandits with Knapsacks
- Authors: Shang Liu, Jiashuo Jiang, Xiaocheng Li
- Abstract summary: We study the problem of bandits with knapsacks (BwK) in a non-stationary environment.
We employ both non-stationarity measures to derive upper and lower bounds for the problem.
- Score: 6.2006721306998065
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study the problem of bandits with knapsacks (BwK) in a
non-stationary environment. The BwK problem generalizes the multi-arm bandit
(MAB) problem to model the resource consumption associated with playing each
arm. At each time, the decision maker/player chooses to play an arm, and s/he
will receive a reward and consume certain amount of resource from each of the
multiple resource types. The objective is to maximize the cumulative reward
over a finite horizon subject to some knapsack constraints on the resources.
Existing works study the BwK problem under either a stochastic or adversarial
environment. Our paper considers a non-stationary environment which
continuously interpolates between these two extremes. We first show that the
traditional notion of variation budget is insufficient to characterize the
non-stationarity of the BwK problem for a sublinear regret due to the presence
of the constraints, and then we propose a new notion of global non-stationarity
measure. We employ both non-stationarity measures to derive upper and lower
bounds for the problem. Our results are based on a primal-dual analysis of the
underlying linear programs and highlight the interplay between the constraints
and the non-stationarity. Finally, we also extend the non-stationarity measure
to the problem of online convex optimization with constraints and obtain new
regret bounds accordingly.
Related papers
- Causal Bandits: The Pareto Optimal Frontier of Adaptivity, a Reduction to Linear Bandits, and Limitations around Unknown Marginals [28.94461817548213]
We prove upper and matching lower bounds on the possible trade-offs in the performance of learning in conditionally benign and arbitrary environments.
We are the first to obtain instance-dependent bounds for causal bandits, by reducing the problem to the linear bandit setting.
arXiv Detail & Related papers (2024-07-01T04:12:15Z) - No-Regret is not enough! Bandits with General Constraints through Adaptive Regret Minimization [26.415300249303748]
We show that it is possible to circumvent the issue of sublinear violations of constraints by requiring the primal and dual algorithm to be weakly adaptive.
In the first case, we show that the algorithm guarantees sublinear regret. In the latter case, we establish a tight competitive ratio of $rho/(1+rho)$.
This results allow us to obtain new result for the problem of contextual bandits with linear constraints.
arXiv Detail & Related papers (2024-05-10T16:22:33Z) - Bandits with Replenishable Knapsacks: the Best of both Worlds [26.786438972889208]
We study a natural generalization of the BwK framework which allows non-monotonic resource utilization.
Under an input model, our algorithm yields an instance-independent $tildeO(T1/2)$ regret bound.
arXiv Detail & Related papers (2023-06-14T12:34:00Z) - Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression [65.8785736964253]
We consider contextual bandits with linear constraints (CBwLC), a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption.
This problem generalizes contextual bandits with knapsacks (CBwK), allowing for packing and covering constraints, as well as positive and negative resource consumption.
We provide the first algorithm for CBwLC (or CBwK) that is based on regression oracles. The algorithm is simple, computationally efficient, and statistically optimal under mild assumptions.
arXiv Detail & Related papers (2022-11-14T16:08:44Z) - A Unifying Framework for Online Optimization with Long-Term Constraints [62.35194099438855]
We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints.
The goal is to maximize their total reward, while at the same time achieving small cumulative violation across the $T$ rounds.
We present the first best-of-both-world type algorithm for this general class problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown model, and in the case in which they are selected at each round by an adversary.
arXiv Detail & Related papers (2022-09-15T16:59:19Z) - On Kernelized Multi-Armed Bandits with Constraints [16.102401271318012]
We study a bandit problem with a general unknown reward function and a general unknown constraint function.
We propose a general framework for both algorithm performance analysis.
We demonstrate the superior performance of our proposed algorithms via numerical experiments.
arXiv Detail & Related papers (2022-03-29T14:02:03Z) - Online Learning with Knapsacks: the Best of Both Worlds [54.28273783164608]
We casting online learning problems in which a decision maker wants to maximize their expected reward without violating a finite set of $m$m resource constraints.
Our framework allows the decision maker to handle its evidence flexibility and costoretic functions.
arXiv Detail & Related papers (2022-02-28T12:10:48Z) - Dare not to Ask: Problem-Dependent Guarantees for Budgeted Bandits [66.02233330016435]
We provide problem-dependent guarantees on both the regret and the asked feedback.
We present a new algorithm called BuFALU for which we derive problem-dependent regret and cumulative feedback bounds.
arXiv Detail & Related papers (2021-10-12T03:24:57Z) - The Symmetry between Bandits and Knapsacks: A Primal-Dual LP-based
Approach [15.626602292752624]
We develop a primal-dual based algorithm that achieves a problem-dependent logarithmic regret bound.
The sub-optimality measure highlights the important role of knapsacks in determining regret.
This is the first problem-dependent logarithmic regret bound for solving the general BwK problem.
arXiv Detail & Related papers (2021-02-12T08:14:30Z) - Coordinated Online Learning for Multi-Agent Systems with Coupled
Constraints and Perturbed Utility Observations [91.02019381927236]
We introduce a novel method to steer the agents toward a stable population state, fulfilling the given resource constraints.
The proposed method is a decentralized resource pricing method based on the resource loads resulting from the augmentation of the game's Lagrangian.
arXiv Detail & Related papers (2020-10-21T10:11:17Z) - On Lower Bounds for Standard and Robust Gaussian Process Bandit
Optimization [55.937424268654645]
We consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm.
We provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity, versatility, and an improved dependence on the error probability.
arXiv Detail & Related papers (2020-08-20T03:48:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.