On Kernelized Multi-Armed Bandits with Constraints
- URL: http://arxiv.org/abs/2203.15589v1
- Date: Tue, 29 Mar 2022 14:02:03 GMT
- Title: On Kernelized Multi-Armed Bandits with Constraints
- Authors: Xingyu Zhou and Bo Ji
- Abstract summary: We study a bandit problem with a general unknown reward function and a general unknown constraint function.
We propose a general framework for both algorithm performance analysis.
We demonstrate the superior performance of our proposed algorithms via numerical experiments.
- Score: 16.102401271318012
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study a stochastic bandit problem with a general unknown reward function
and a general unknown constraint function. Both functions can be non-linear
(even non-convex) and are assumed to lie in a reproducing kernel Hilbert space
(RKHS) with a bounded norm. This kernelized bandit setup strictly generalizes
standard multi-armed bandits and linear bandits. In contrast to safety-type
hard constraints studied in prior works, we consider soft constraints that may
be violated in any round as long as the cumulative violations are small, which
is motivated by various practical applications. Our ultimate goal is to study
how to utilize the nature of soft constraints to attain a finer
complexity-regret-constraint trade-off in the kernelized bandit setting. To
this end, leveraging primal-dual optimization, we propose a general framework
for both algorithm design and performance analysis. This framework builds upon
a novel sufficient condition, which not only is satisfied under general
exploration strategies, including \emph{upper confidence bound} (UCB),
\emph{Thompson sampling} (TS), and new ones based on \emph{random exploration},
but also enables a unified analysis for showing both sublinear regret and
sublinear or even zero constraint violation. We demonstrate the superior
performance of our proposed algorithms via numerical experiments based on both
synthetic and real-world datasets. Along the way, we also make the first
detailed comparison between two popular methods for analyzing constrained
bandits and Markov decision processes (MDPs) by discussing the key difference
and some subtleties in the analysis, which could be of independent interest to
the communities.
Related papers
- Learning to Explore with Lagrangians for Bandits under Unknown Linear Constraints [8.784438985280094]
We study problems as pure exploration in multi-armed bandits with unknown linear constraints.
First, we propose a Lagrangian relaxation of the sample complexity lower bound for pure exploration under constraints.
Second, we leverage the Lagrangian lower bound and the properties of convex to propose two computationally efficient extensions of Track-and-Stop and Gamified Explorer, namely LATS and LAGEX.
arXiv Detail & Related papers (2024-10-24T15:26:14Z) - Beyond Primal-Dual Methods in Bandits with Stochastic and Adversarial Constraints [29.514323697659613]
We address a generalization of the bandit with knapsacks problem, where a learner aims to maximize rewards while satisfying an arbitrary set of long-term constraints.
Our goal is to design best-of-both-worlds algorithms that perform under both and adversarial constraints.
arXiv Detail & Related papers (2024-05-25T08:09:36Z) - Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression [65.8785736964253]
We consider contextual bandits with linear constraints (CBwLC), a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption.
This problem generalizes contextual bandits with knapsacks (CBwK), allowing for packing and covering constraints, as well as positive and negative resource consumption.
We provide the first algorithm for CBwLC (or CBwK) that is based on regression oracles. The algorithm is simple, computationally efficient, and statistically optimal under mild assumptions.
arXiv Detail & Related papers (2022-11-14T16:08:44Z) - Dual Instrumental Method for Confounded Kernelized Bandits [0.0]
The contextual bandit problem is a framework with wide applications in various fields.
We propose a confounded bandit problem where the noise becomes a latent confounder that affects both contexts and rewards.
We show that a dual instrumental variable regression can correctly identify the true reward function.
arXiv Detail & Related papers (2022-09-07T15:25:57Z) - A Unifying Theory of Thompson Sampling for Continuous Risk-Averse
Bandits [91.3755431537592]
This paper unifies the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem.
Using the contraction principle in the theory of large deviations, we prove novel concentration bounds for continuous risk functionals.
We show that a wide class of risk functionals as well as "nice" functions of them satisfy the continuity condition.
arXiv Detail & Related papers (2021-08-25T17:09:01Z) - Optimal Gradient-based Algorithms for Non-concave Bandit Optimization [76.57464214864756]
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave.
Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality.
We show that the standard optimistic algorithms are sub-optimal by dimension factors.
arXiv Detail & Related papers (2021-07-09T16:04:24Z) - Online and Distribution-Free Robustness: Regression and Contextual
Bandits with Huber Contamination [29.85468294601847]
We revisit two classic high-dimensional online learning problems, namely linear regression and contextual bandits.
We show that our algorithms succeed where conventional methods fail.
arXiv Detail & Related papers (2020-10-08T17:59:05Z) - Nearly Dimension-Independent Sparse Linear Bandit over Small Action
Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model.
This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine.
We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z) - On Lower Bounds for Standard and Robust Gaussian Process Bandit
Optimization [55.937424268654645]
We consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm.
We provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity, versatility, and an improved dependence on the error probability.
arXiv Detail & Related papers (2020-08-20T03:48:14Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.