Syndicated Bandits: A Framework for Auto Tuning Hyper-parameters in
Contextual Bandit Algorithms
- URL: http://arxiv.org/abs/2106.02979v1
- Date: Sat, 5 Jun 2021 22:30:21 GMT
- Title: Syndicated Bandits: A Framework for Auto Tuning Hyper-parameters in
Contextual Bandit Algorithms
- Authors: Qin Ding, Yi-Wei Liu, Cho-Jui Hsieh, James Sharpnack
- Abstract summary: The contextual bandit problem models the trade-off between exploration and exploitation.
We show our Syndicated Bandits framework can achieve the optimal regret upper bounds.
- Score: 74.55200180156906
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The stochastic contextual bandit problem, which models the trade-off between
exploration and exploitation, has many real applications, including recommender
systems, online advertising and clinical trials. As many other machine learning
algorithms, contextual bandit algorithms often have one or more
hyper-parameters. As an example, in most optimal stochastic contextual bandit
algorithms, there is an unknown exploration parameter which controls the
trade-off between exploration and exploitation. A proper choice of the
hyper-parameters is essential for contextual bandit algorithms to perform well.
However, it is infeasible to use offline tuning methods to select
hyper-parameters in contextual bandit environment since there is no
pre-collected dataset and the decisions have to be made in real time. To tackle
this problem, we first propose a two-layer bandit structure for auto tuning the
exploration parameter and further generalize it to the Syndicated Bandits
framework which can learn multiple hyper-parameters dynamically in contextual
bandit environment. We show our Syndicated Bandits framework can achieve the
optimal regret upper bounds and is general enough to handle the tuning tasks in
many popular contextual bandit algorithms, such as LinUCB, LinTS, UCB-GLM, etc.
Experiments on both synthetic and real datasets validate the effectiveness of
our proposed framework.
Related papers
- Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - Feel-Good Thompson Sampling for Contextual Dueling Bandits [49.450050682705026]
We propose a Thompson sampling algorithm, named FGTS.CDB, for linear contextual dueling bandits.
At the core of our algorithm is a new Feel-Good exploration term specifically tailored for dueling bandits.
Our algorithm achieves nearly minimax-optimal regret, i.e., $tildemathcalO(dsqrt T)$, where $d$ is the model dimension and $T$ is the time horizon.
arXiv Detail & Related papers (2024-04-09T04:45:18Z) - An Optimal Algorithm for the Real-Valued Combinatorial Pure Exploration
of Multi-Armed Bandit [65.268245109828]
We study the real-valued pure exploration problem in the multi-armed bandit (R-CPE-MAB)
Existing methods in the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits.
We propose an algorithm named the gap-based exploration (CombGapE) algorithm, whose sample complexity matches the lower bound.
arXiv Detail & Related papers (2023-06-15T15:37:31Z) - Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits [55.03293214439741]
In contextual bandits, an agent sequentially makes actions from a time-dependent action set based on past experience.
We propose the first online continuous hyperparameter tuning framework for contextual bandits.
We show that it could achieve a sublinear regret in theory and performs consistently better than all existing methods on both synthetic and real datasets.
arXiv Detail & Related papers (2023-02-18T23:31:20Z) - On Kernelized Multi-Armed Bandits with Constraints [16.102401271318012]
We study a bandit problem with a general unknown reward function and a general unknown constraint function.
We propose a general framework for both algorithm performance analysis.
We demonstrate the superior performance of our proposed algorithms via numerical experiments.
arXiv Detail & Related papers (2022-03-29T14:02:03Z) - A Simple Unified Framework for High Dimensional Bandit Problems [33.139925285802825]
We present a general analysis framework for the regret upper bound of our algorithm.
We show that our algorithm can be applied to different high dimensional bandit problems.
arXiv Detail & Related papers (2021-02-18T21:35:32Z) - Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit
Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set.
We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set.
Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z) - Hyper-parameter Tuning for the Contextual Bandit [22.721128745617076]
We study the problem of learning the exploration exploitation trade-off in the contextual bandit problem with linear reward function setting.
Our proposed algorithm learn to choose the right exploration parameters in an online manner based on the observed context.
We have presented here two algorithms that uses a bandit to find the optimal exploration of the contextual bandit algorithm.
arXiv Detail & Related papers (2020-05-04T17:20:19Z) - Bandit algorithms to emulate human decision making using probabilistic
distortions [20.422725678982726]
We formulate two sample multi-armed bandit problems with distorted probabilities on the reward distributions.
We consider the aforementioned problems in the regret minimization as well as best arm identification framework for multi-armed bandits.
arXiv Detail & Related papers (2016-11-30T17:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.