Small Total-Cost Constraints in Contextual Bandits with Knapsacks, with
Application to Fairness
- URL: http://arxiv.org/abs/2305.15807v2
- Date: Thu, 26 Oct 2023 11:52:12 GMT
- Title: Small Total-Cost Constraints in Contextual Bandits with Knapsacks, with
Application to Fairness
- Authors: Evgenii Chzhen (LMO, CELESTE), Christophe Giraud (LMO, CELESTE), Zhen
Li, Gilles Stoltz (LMO, CELESTE, HEC Paris)
- Abstract summary: We consider contextual bandit problems with knapsacks [CBwK], a problem where at each round, a scalar reward is obtained and vector-valued costs are suffered.
We introduce a dual strategy based on projected-gradient-descent updates, that is able to deal with total-cost constraints of the order of $sqrtT$ up to poly-logarithmic terms.
- Score: 1.6741394365746018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider contextual bandit problems with knapsacks [CBwK], a problem where
at each round, a scalar reward is obtained and vector-valued costs are
suffered. The learner aims to maximize the cumulative rewards while ensuring
that the cumulative costs are lower than some predetermined cost constraints.
We assume that contexts come from a continuous set, that costs can be signed,
and that the expected reward and cost functions, while unknown, may be
uniformly estimated -- a typical assumption in the literature. In this setting,
total cost constraints had so far to be at least of order $T^{3/4}$, where $T$
is the number of rounds, and were even typically assumed to depend linearly on
$T$. We are however motivated to use CBwK to impose a fairness constraint of
equalized average costs between groups: the budget associated with the
corresponding cost constraints should be as close as possible to the natural
deviations, of order $\sqrt{T}$. To that end, we introduce a dual strategy
based on projected-gradient-descent updates, that is able to deal with
total-cost constraints of the order of $\sqrt{T}$ up to poly-logarithmic terms.
This strategy is more direct and simpler than existing strategies in the
literature. It relies on a careful, adaptive, tuning of the step size.
Related papers
- No-Regret is not enough! Bandits with General Constraints through Adaptive Regret Minimization [26.415300249303748]
We show that it is possible to circumvent the issue of sublinear violations of constraints by requiring the primal and dual algorithm to be weakly adaptive.
In the first case, we show that the algorithm guarantees sublinear regret. In the latter case, we establish a tight competitive ratio of $rho/(1+rho)$.
This results allow us to obtain new result for the problem of contextual bandits with linear constraints.
arXiv Detail & Related papers (2024-05-10T16:22:33Z) - Probably Anytime-Safe Stochastic Combinatorial Semi-Bandits [81.60136088841948]
We propose an algorithm that minimizes the regret over the horizon of time $T$.
The proposed algorithm is applicable to domains such as recommendation systems and transportation.
arXiv Detail & Related papers (2023-01-31T03:49:00Z) - A Unifying Framework for Online Optimization with Long-Term Constraints [62.35194099438855]
We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints.
The goal is to maximize their total reward, while at the same time achieving small cumulative violation across the $T$ rounds.
We present the first best-of-both-world type algorithm for this general class problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown model, and in the case in which they are selected at each round by an adversary.
arXiv Detail & Related papers (2022-09-15T16:59:19Z) - Contextual Bandits with Knapsacks for a Conversion Model [1.8659901274492419]
We consider contextual bandits with knapsacks, with an underlying structure between rewards generated and cost vectors suffered.
We show that the techniques introduced in this article may also be applied to the latter case.
Namely, the adaptive policies exhibited at each round based on upper-confidence estimates of the probabilities of conversion given $a$ and $mathbfx$.
arXiv Detail & Related papers (2022-06-01T08:29:07Z) - On Dynamic Pricing with Covariates [6.6543199581017625]
We show that UCB and Thompson sampling-based pricing algorithms can achieve an $O(dsqrtTlog T)$ regret upper bound.
Our upper bound on the regret matches the lower bound up to logarithmic factors.
arXiv Detail & Related papers (2021-12-25T16:30:13Z) - Stochastic Shortest Path: Minimax, Parameter-Free and Towards
Horizon-Free Regret [144.6358229217845]
We study the problem of learning in the shortest path (SSP) setting, where an agent seeks to minimize the expected cost accumulated before reaching a goal state.
We design a novel model-based algorithm EB-SSP that carefully skews the empirical transitions and perturbs the empirical costs with an exploration bonus.
We prove that EB-SSP achieves the minimax regret rate $widetildeO(B_star sqrtS A K)$, where $K$ is the number of episodes, $S$ is the number of states, $A$
arXiv Detail & Related papers (2021-04-22T17:20:48Z) - Nearly Dimension-Independent Sparse Linear Bandit over Small Action
Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model.
This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine.
We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z) - Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not.
We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively.
In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z) - Budget-Constrained Bandits over General Cost and Reward Distributions [32.63624728528415]
We consider a budget-constrained bandit problem where each arm pull incurs a random cost, and yields a random reward in return.
We show that if moments of order $(2+gamma)$ for some $gamma > 0$ exist for all cost-reward pairs, $O(log B)$ regret is achievable for a budget $B>0$.
arXiv Detail & Related papers (2020-02-29T23:50:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.