Proportional Response: Contextual Bandits for Simple and Cumulative
Regret Minimization
- URL: http://arxiv.org/abs/2307.02108v3
- Date: Fri, 3 Nov 2023 00:54:25 GMT
- Title: Proportional Response: Contextual Bandits for Simple and Cumulative
Regret Minimization
- Authors: Sanath Kumar Krishnamurthy, Ruohan Zhan, Susan Athey, Emma Brunskill
- Abstract summary: We propose a new family of efficient bandit algorithms for the contextual bandit setting.
Our algorithms work with any function class, are robust to model misspecification, and can be used in continuous arm settings.
- Score: 29.579719765255927
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many applications, e.g. in healthcare and e-commerce, the goal of a
contextual bandit may be to learn an optimal treatment assignment policy at the
end of the experiment. That is, to minimize simple regret. However, this
objective remains understudied. We propose a new family of computationally
efficient bandit algorithms for the stochastic contextual bandit setting, where
a tuning parameter determines the weight placed on cumulative regret
minimization (where we establish near-optimal minimax guarantees) versus simple
regret minimization (where we establish state-of-the-art guarantees). Our
algorithms work with any function class, are robust to model misspecification,
and can be used in continuous arm settings. This flexibility comes from
constructing and relying on "conformal arm sets" (CASs). CASs provide a set of
arms for every context, encompassing the context-specific optimal arm with a
certain probability across the context distribution. Our positive results on
simple and cumulative regret guarantees are contrasted with a negative result,
which shows that no algorithm can achieve instance-dependent simple regret
guarantees while simultaneously achieving minimax optimal cumulative regret
guarantees.
Related papers
- Best Arm Identification with Minimal Regret [55.831935724659175]
Best arm identification problem elegantly amalgamates regret minimization and BAI.
Agent's goal is to identify the best arm with a prescribed confidence level.
Double KL-UCB algorithm achieves optimality as the confidence level tends to zero.
arXiv Detail & Related papers (2024-09-27T16:46:02Z) - Beyond Primal-Dual Methods in Bandits with Stochastic and Adversarial Constraints [29.514323697659613]
We address a generalization of the bandit with knapsacks problem, where a learner aims to maximize rewards while satisfying an arbitrary set of long-term constraints.
Our goal is to design best-of-both-worlds algorithms that perform under both and adversarial constraints.
arXiv Detail & Related papers (2024-05-25T08:09:36Z) - Batched Nonparametric Contextual Bandits [21.031965676746776]
We study nonparametric contextual bandits under batch constraints.
We propose a novel batch learning algorithm that achieves the optimal regret.
Our theoretical results suggest that for nonparametric contextual bandits, a nearly constant number of policy updates can attain optimal regret.
arXiv Detail & Related papers (2024-02-27T18:06:20Z) - Optimal Algorithms for Stochastic Complementary Composite Minimization [55.26935605535377]
Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization.
We provide novel excess risk bounds, both in expectation and with high probability.
Our algorithms are nearly optimal, which we prove via novel lower complexity bounds for this class of problems.
arXiv Detail & Related papers (2022-11-03T12:40:24Z) - A Unifying Framework for Online Optimization with Long-Term Constraints [62.35194099438855]
We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints.
The goal is to maximize their total reward, while at the same time achieving small cumulative violation across the $T$ rounds.
We present the first best-of-both-world type algorithm for this general class problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown model, and in the case in which they are selected at each round by an adversary.
arXiv Detail & Related papers (2022-09-15T16:59:19Z) - Nearly Dimension-Independent Sparse Linear Bandit over Small Action
Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model.
This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine.
We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z) - On Lower Bounds for Standard and Robust Gaussian Process Bandit
Optimization [55.937424268654645]
We consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm.
We provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity, versatility, and an improved dependence on the error probability.
arXiv Detail & Related papers (2020-08-20T03:48:14Z) - Lenient Regret for Multi-Armed Bandits [72.56064196252498]
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and observes rewards for the actions it took.
While the majority of algorithms try to minimize the regret, i.e., the cumulative difference between the reward of the best action and the agent's action, this criterion might lead to undesirable results.
We suggest a new, more lenient, regret criterion that ignores suboptimality gaps smaller than some $epsilon$.
arXiv Detail & Related papers (2020-08-10T08:30:52Z) - Constrained regret minimization for multi-criterion multi-armed bandits [5.349852254138086]
We study the problem of regret minimization over a given time horizon, subject to a risk constraint.
We propose a Risk-Constrained Lower Confidence Bound algorithm that guarantees logarithmic regret.
We prove lower bounds on the performance of any risk-constrained regret minimization algorithm.
arXiv Detail & Related papers (2020-06-17T04:23:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.