Kernel $\epsilon$-Greedy for Contextual Bandits
- URL: http://arxiv.org/abs/2306.17329v1
- Date: Thu, 29 Jun 2023 22:48:34 GMT
- Title: Kernel $\epsilon$-Greedy for Contextual Bandits
- Authors: Sakshi Arya and Bharath K. Sriperumbudur
- Abstract summary: We consider a kernelized version of the $epsilon$-greedy strategy for contextual bandits.
We propose an online weighted kernel ridge regression estimator for the reward functions.
- Score: 4.1347433277076036
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider a kernelized version of the $\epsilon$-greedy strategy for
contextual bandits. More precisely, in a setting with finitely many arms, we
consider that the mean reward functions lie in a reproducing kernel Hilbert
space (RKHS). We propose an online weighted kernel ridge regression estimator
for the reward functions. Under some conditions on the exploration probability
sequence, $\{\epsilon_t\}_t$, and choice of the regularization parameter,
$\{\lambda_t\}_t$, we show that the proposed estimator is consistent. We also
show that for any choice of kernel and the corresponding RKHS, we achieve a
sub-linear regret rate depending on the intrinsic dimensionality of the RKHS.
Furthermore, we achieve the optimal regret rate of $\sqrt{T}$ under a margin
condition for finite-dimensional RKHS.
Related papers
- Improved Regret Bound for Safe Reinforcement Learning via Tighter Cost Pessimism and Reward Optimism [1.4999444543328293]
We propose a model-based algorithm based on novel cost and reward function estimators.
Our algorithm achieves a regret upper bound of $widetildemathcalO((bar C - bar C_b)-1H2.5 SsqrtAK)$ where $bar C$ is the cost budget for an episode.
arXiv Detail & Related papers (2024-10-14T04:51:06Z) - The Minimax Rate of HSIC Estimation for Translation-Invariant Kernels [0.0]
We prove that the minimax optimal rate of HSIC estimation on $mathbb Rd$ for Borel measures containing the Gaussians with continuous bounded translation-invariant characteristic kernels is $mathcal O!left(n-1/2right)$.
arXiv Detail & Related papers (2024-03-12T15:13:21Z) - The $L^\infty$ Learnability of Reproducing Kernel Hilbert Spaces [3.2931415075553576]
We analyze the learnability of kernel spaces (RKHS) under the $Linfty$ norm.
For dot-product kernels on the sphere, we identify conditions when the $Linfty$ learning can be achieved with Hilbert samples.
arXiv Detail & Related papers (2023-06-05T12:29:13Z) - Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs [72.40181882916089]
We show that our algorithm achieves an $tildeObig((d+log (|mathcalS|2 |mathcalA|))sqrtKbig)$ regret with full-information feedback, where $d$ is the dimension of a known feature mapping linearly parametrizing the unknown transition kernel of the MDP, $K$ is the number of episodes, $|mathcalS|$ and $|mathcalA|$ are the cardinalities of the state and action spaces
arXiv Detail & Related papers (2023-05-15T05:37:32Z) - Spectral bounds of the $\varepsilon$-entropy of kernel classes [6.028247638616059]
We develop new bounds on the $varepsilon$-entropy of a unit ball in a reproducing kernel space induced by some Mercer kernel $K$.
In our approach we exploit an ellipsoidal structure of a unit ball in RKHS and a previous work on covering numbers of an ellipsoid in the euclidean space.
arXiv Detail & Related papers (2022-04-09T16:45:22Z) - Meta-Learning Hypothesis Spaces for Sequential Decision-making [79.73213540203389]
We propose to meta-learn a kernel from offline data (Meta-KeL)
Under mild conditions, we guarantee that our estimated RKHS yields valid confidence sets.
We also empirically evaluate the effectiveness of our approach on a Bayesian optimization task.
arXiv Detail & Related papers (2022-02-01T17:46:51Z) - Misspecified Gaussian Process Bandit Optimization [59.30399661155574]
Kernelized bandit algorithms have shown strong empirical and theoretical performance for this problem.
We introduce a emphmisspecified kernelized bandit setting where the unknown function can be $epsilon$--uniformly approximated by a function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS)
We show that our algorithm achieves optimal dependence on $epsilon$ with no prior knowledge of misspecification.
arXiv Detail & Related papers (2021-11-09T09:00:02Z) - Stochastic Linear Bandits with Protected Subspace [51.43660657268171]
We study a variant of the linear bandit problem wherein we optimize a linear objective function but rewards are accrued only to an unknown subspace.
In particular, at each round, the learner must choose whether to query the objective or the protected subspace alongside choosing an action.
Our algorithm, derived from the OFUL principle, uses some of the queries to get an estimate of the protected space.
arXiv Detail & Related papers (2020-11-02T14:59:39Z) - Kernel-Based Reinforcement Learning: A Finite-Time Analysis [53.47210316424326]
We introduce Kernel-UCBVI, a model-based optimistic algorithm that leverages the smoothness of the MDP and a non-parametric kernel estimator of the rewards.
We empirically validate our approach in continuous MDPs with sparse rewards.
arXiv Detail & Related papers (2020-04-12T12:23:46Z) - Naive Exploration is Optimal for Online LQR [49.681825576239355]
We show that the optimal regret scales as $widetildeTheta(sqrtd_mathbfu2 d_mathbfx T)$, where $T$ is the number of time steps, $d_mathbfu$ is the dimension of the input space, and $d_mathbfx$ is the dimension of the system state.
Our lower bounds rule out the possibility of a $mathrmpoly(logT)$-regret algorithm, which had been
arXiv Detail & Related papers (2020-01-27T03:44:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.