Optimal Contextual Bandits with Knapsacks under Realizibility via
Regression Oracles
- URL: http://arxiv.org/abs/2210.11834v1
- Date: Fri, 21 Oct 2022 09:28:53 GMT
- Title: Optimal Contextual Bandits with Knapsacks under Realizibility via
Regression Oracles
- Authors: Yuxuan Han, Jialin Zeng, Yang Wang, Yang Xiang, Jiheng Zhang
- Abstract summary: We study the contextual bandit with knapsacks (CBwK) problem, where each action leads to a random reward but also costs a random resource consumption in a vector form.
We propose the first universal and optimal algorithmic framework for CBwK by reducing it to online regression.
- Score: 14.634964681825197
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the stochastic contextual bandit with knapsacks (CBwK) problem,
where each action, taken upon a context, not only leads to a random reward but
also costs a random resource consumption in a vector form. The challenge is to
maximize the total reward without violating the budget for each resource. We
study this problem under a general realizability setting where the expected
reward and expected cost are functions of contexts and actions in some given
general function classes $\mathcal{F}$ and $\mathcal{G}$, respectively.
Existing works on CBwK are restricted to the linear function class since they
use UCB-type algorithms, which heavily rely on the linear form and thus are
difficult to extend to general function classes. Motivated by online regression
oracles that have been successfully applied to contextual bandits, we propose
the first universal and optimal algorithmic framework for CBwK by reducing it
to online regression. We also establish the lower regret bound to show the
optimality of our algorithm for a variety of function classes.
Related papers
- Second Order Methods for Bandit Optimization and Control [34.51425758864638]
We show that our algorithm achieves optimal (in terms of terms of convex functions that we call $kappa$-2020) regret bounds for a large class of convex functions.
We also investigate the adaptation of our second-order bandit algorithm to online convex optimization with memory.
arXiv Detail & Related papers (2024-02-14T04:03:38Z) - Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression [65.8785736964253]
We consider contextual bandits with linear constraints (CBwLC), a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption.
This problem generalizes contextual bandits with knapsacks (CBwK), allowing for packing and covering constraints, as well as positive and negative resource consumption.
We provide the first algorithm for CBwLC (or CBwK) that is based on regression oracles. The algorithm is simple, computationally efficient, and statistically optimal under mild assumptions.
arXiv Detail & Related papers (2022-11-14T16:08:44Z) - Contextual bandits with concave rewards, and an application to fair
ranking [108.48223948875685]
We present the first algorithm with provably vanishing regret for Contextual Bandits with Concave Rewards (CBCR)
We derive a novel reduction from the CBCR regret to the regret of a scalar-reward problem.
Motivated by fairness in recommendation, we describe a special case of CBCR with rankings and fairness-aware objectives.
arXiv Detail & Related papers (2022-10-18T16:11:55Z) - Anti-Concentrated Confidence Bonuses for Scalable Exploration [57.91943847134011]
Intrinsic rewards play a central role in handling the exploration-exploitation trade-off.
We introduce emphanti-concentrated confidence bounds for efficiently approximating the elliptical bonus.
We develop a practical variant for deep reinforcement learning that is competitive with contemporary intrinsic rewards on Atari benchmarks.
arXiv Detail & Related papers (2021-10-21T15:25:15Z) - Adapting to Misspecification in Contextual Bandits [82.55565343668246]
We introduce a new family of oracle-efficient algorithms for $varepsilon$-misspecified contextual bandits.
We obtain the first algorithm that achieves the optimal $O(dsqrtT + varepsilonsqrtdT)$ regret bound for unknown misspecification level.
arXiv Detail & Related papers (2021-07-12T21:30:41Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z) - Bypassing the Monster: A Faster and Simpler Optimal Algorithm for
Contextual Bandits under Realizability [18.45278329799526]
We design a fast and simple algorithm that achieves the statistically optimal regret with only $O(log T)$ calls to an offline regression oracle.
Our results provide the first universal and optimal reduction from contextual bandits to offline regression.
arXiv Detail & Related papers (2020-03-28T04:16:52Z) - Beyond UCB: Optimal and Efficient Contextual Bandits with Regression
Oracles [112.89548995091182]
We provide the first universal and optimal reduction from contextual bandits to online regression.
Our algorithm requires no distributional assumptions beyond realizability, and works even when contexts are chosen adversarially.
arXiv Detail & Related papers (2020-02-12T11:33:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.