Approximation Theory Based Methods for RKHS Bandits
- URL: http://arxiv.org/abs/2010.12167v5
- Date: Mon, 26 Jul 2021 02:04:09 GMT
- Title: Approximation Theory Based Methods for RKHS Bandits
- Authors: Sho Takemori, Masahiro Sato
- Abstract summary: The RKHS bandit problem is an online optimization problem of non-linear functions with noisy feedback.
There is no general algorithm for the adversarial RKHS bandit problem.
We propose efficient algorithms for the RKHS bandit problem and the first general algorithm for the adversarial RKHS bandit problem.
- Score: 9.391375268580806
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The RKHS bandit problem (also called kernelized multi-armed bandit problem)
is an online optimization problem of non-linear functions with noisy feedback.
Although the problem has been extensively studied, there are unsatisfactory
results for some problems compared to the well-studied linear bandit case.
Specifically, there is no general algorithm for the adversarial RKHS bandit
problem. In addition, high computational complexity of existing algorithms
hinders practical application. We address these issues by considering a novel
amalgamation of approximation theory and the misspecified linear bandit
problem. Using an approximation method, we propose efficient algorithms for the
stochastic RKHS bandit problem and the first general algorithm for the
adversarial RKHS bandit problem. Furthermore, we empirically show that one of
our proposed methods has comparable cumulative regret to IGP-UCB and its
running time is much shorter.
Related papers
- An Optimal Algorithm for the Real-Valued Combinatorial Pure Exploration
of Multi-Armed Bandit [65.268245109828]
We study the real-valued pure exploration problem in the multi-armed bandit (R-CPE-MAB)
Existing methods in the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits.
We propose an algorithm named the gap-based exploration (CombGapE) algorithm, whose sample complexity matches the lower bound.
arXiv Detail & Related papers (2023-06-15T15:37:31Z) - Optimal Gradient-based Algorithms for Non-concave Bandit Optimization [76.57464214864756]
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave.
Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality.
We show that the standard optimistic algorithms are sub-optimal by dimension factors.
arXiv Detail & Related papers (2021-07-09T16:04:24Z) - A Simple Unified Framework for High Dimensional Bandit Problems [33.139925285802825]
We present a general analysis framework for the regret upper bound of our algorithm.
We show that our algorithm can be applied to different high dimensional bandit problems.
arXiv Detail & Related papers (2021-02-18T21:35:32Z) - Upper Confidence Bounds for Combining Stochastic Bandits [52.10197476419621]
We provide a simple method to combine bandit algorithms.
Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem.
arXiv Detail & Related papers (2020-12-24T05:36:29Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - A PAC algorithm in relative precision for bandit problem with costly
sampling [0.0]
We first propose a naive bandit algorithm for obtaining a probably approximately correct (PAC) solution to this discrete optimization problem in relative precision.
We also propose an adaptive bandit algorithm which provides a PAC-solution with the same guarantees.
arXiv Detail & Related papers (2020-07-30T09:22:25Z) - Corralling Stochastic Bandit Algorithms [54.10645564702416]
We show that the regret of the corralling algorithms is no worse than that of the best algorithm containing the arm with the highest reward.
We show that the gap between the highest reward and other rewards depends on the gap between the highest reward and other rewards.
arXiv Detail & Related papers (2020-06-16T15:33:12Z) - Corruption-Tolerant Gaussian Process Bandit Optimization [130.60115798580136]
We consider the problem of optimizing an unknown (typically non-producing) function with a bounded norm.
We introduce an algorithm based on Fast-Slow GP-UCB based on "fast but non-robust" and "slow"
We argue that certain dependencies cannot be required depending on the corruption level.
arXiv Detail & Related papers (2020-03-04T09:46:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.