An Optimization-based Algorithm for Non-stationary Kernel Bandits
without Prior Knowledge
- URL: http://arxiv.org/abs/2205.14775v1
- Date: Sun, 29 May 2022 21:32:53 GMT
- Title: An Optimization-based Algorithm for Non-stationary Kernel Bandits
without Prior Knowledge
- Authors: Kihyuk Hong, Yuhang Li, Ambuj Tewari
- Abstract summary: We propose an algorithm for non-stationary kernel bandits that does not require prior knowledge of the degree of non-stationarity.
Our algorithm enjoys a tighter dynamic regret bound than previous work on the non-stationary kernel bandit setting.
- Score: 23.890686553141798
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose an algorithm for non-stationary kernel bandits that does not
require prior knowledge of the degree of non-stationarity. The algorithm
follows randomized strategies obtained by solving optimization problems that
balance exploration and exploitation. It adapts to non-stationarity by
restarting when a change in the reward function is detected. Our algorithm
enjoys a tighter dynamic regret bound than previous work on the non-stationary
kernel bandit setting. Moreover, when applied to the non-stationary linear
bandit setting by using a linear kernel, our algorithm is nearly minimax
optimal, solving an open problem in the non-stationary linear bandit
literature. We extend our algorithm to use a neural network for dynamically
adapting the feature mapping to observed data. We prove a dynamic regret bound
of the extension using the neural tangent kernel theory. We demonstrate
empirically that our algorithm and the extension can adapt to varying degrees
of non-stationarity.
Related papers
- Near-Optimal Algorithm for Non-Stationary Kernelized Bandits [6.379833644595456]
We study a non-stationary kernelized bandit (KB) problem, also called time-varying Bayesian optimization.
We show the first algorithm-independent regret lower bound for non-stationary KB with squared exponential and Mat'ern kernels.
We propose a novel near-optimal algorithm called restarting phased elimination with random permutation.
arXiv Detail & Related papers (2024-10-21T14:28:26Z) - Is Prior-Free Black-Box Non-Stationary Reinforcement Learning Feasible? [17.220126722355626]
We study the problem of Non-Stationary Reinforcement Learning (NS-RL) without prior knowledge about the system's non-stationarity.
A state-of-the-art, black-box algorithm, known as MASTER, is considered, with a focus on identifying the conditions under which it can achieve its stated goals.
arXiv Detail & Related papers (2024-10-17T17:09:56Z) - Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds.
We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$.
Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z) - Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - Lifelong Bandit Optimization: No Prior and No Regret [70.94238868711952]
We develop LIBO, an algorithm which adapts to the environment by learning from past experience.
We assume a kernelized structure where the kernel is unknown but shared across all tasks.
Our algorithm can be paired with any kernelized or linear bandit algorithm and guarantees optimal performance.
arXiv Detail & Related papers (2022-10-27T14:48:49Z) - On Adaptivity in Non-stationary Stochastic Optimization With Bandit
Feedback [11.208914594208654]
We show that, when aggregated function changes is known a priori, a simple re-starting algorithm attains the optimal dynamic regret.
We also establish an additional result showing that any algorithm achieving good regret against stationary benchmarks with high probability could be automatically converted to an algorithm that achieves good regret against dynamic benchmarks.
arXiv Detail & Related papers (2022-10-11T16:16:34Z) - Online Limited Memory Neural-Linear Bandits with Likelihood Matching [53.18698496031658]
We study neural-linear bandits for solving problems where both exploration and representation learning play an important role.
We propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.
arXiv Detail & Related papers (2021-02-07T14:19:07Z) - Experimental Design for Regret Minimization in Linear Bandits [19.8309784360219]
We propose a novel design-based algorithm to minimize regret in online linear and bandits.
We provide state-of-the-art finite time regret guarantees and show that our algorithm can be applied in both the bandit and semi-bandit feedback regime.
arXiv Detail & Related papers (2020-11-01T17:59:19Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Corruption-Tolerant Gaussian Process Bandit Optimization [130.60115798580136]
We consider the problem of optimizing an unknown (typically non-producing) function with a bounded norm.
We introduce an algorithm based on Fast-Slow GP-UCB based on "fast but non-robust" and "slow"
We argue that certain dependencies cannot be required depending on the corruption level.
arXiv Detail & Related papers (2020-03-04T09:46:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.