Adversarial Contextual Bandits Go Kernelized
- URL: http://arxiv.org/abs/2310.01609v1
- Date: Mon, 2 Oct 2023 19:59:39 GMT
- Title: Adversarial Contextual Bandits Go Kernelized
- Authors: Gergely Neu, Julia Olkhovskaya, Sattar Vakili
- Abstract summary: We study a generalization of the problem of online learning in adversarial linear contextual bandits by incorporating loss functions that belong to a Hilbert kernel space.
We propose a new optimistically biased estimator for the loss functions and reproducing near-optimal regret guarantees.
- Score: 21.007410990554522
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We study a generalization of the problem of online learning in adversarial
linear contextual bandits by incorporating loss functions that belong to a
reproducing kernel Hilbert space, which allows for a more flexible modeling of
complex decision-making scenarios. We propose a computationally efficient
algorithm that makes use of a new optimistically biased estimator for the loss
functions and achieves near-optimal regret guarantees under a variety of
eigenvalue decay assumptions made on the underlying kernel. Specifically, under
the assumption of polynomial eigendecay with exponent $c>1$, the regret is
$\widetilde{O}(KT^{\frac{1}{2}(1+\frac{1}{c})})$, where $T$ denotes the number
of rounds and $K$ the number of actions. Furthermore, when the eigendecay
follows an exponential pattern, we achieve an even tighter regret bound of
$\widetilde{O}(\sqrt{T})$. These rates match the lower bounds in all special
cases where lower bounds are known at all, and match the best known upper
bounds available for the more well-studied stochastic counterpart of our
problem.
Related papers
- An Equivalence Between Static and Dynamic Regret Minimization [10.812831455376218]
We show that for linear losses, dynamic regret minimization is equivalent to static regret minimization in an extended decision space.
We provide an algorithm guaranteeing dynamic regret of the form $R_T(u_1,dots,u_T)le tilde.
arXiv Detail & Related papers (2024-06-03T17:54:58Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems [61.85150061213987]
We study the generalized low-rank matrix bandit problem, proposed in citelu2021low under the Generalized Linear Model (GLM) framework.
To overcome the computational infeasibility and theoretical restrain of existing algorithms, we first propose the G-ESTT framework.
We show that G-ESTT can achieve the $tildeO(sqrt(d_1+d_2)3/2Mr3/2T)$ bound of regret while G-ESTS can achineve the $tildeO
arXiv Detail & Related papers (2024-01-14T14:14:19Z) - Refined Regret for Adversarial MDPs with Linear Function Approximation [50.00022394876222]
We consider learning in an adversarial Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes.
This paper provides two algorithms that improve the regret to $tildemathcal O(K2/3)$ in the same setting.
arXiv Detail & Related papers (2023-01-30T14:37:21Z) - LegendreTron: Uprising Proper Multiclass Loss Learning [22.567234503869845]
Loss functions serve as the foundation of supervised learning and are often chosen prior to model development.
Recent works have sought to emphlearn losses and models jointly.
We present sc LegendreTron as a novel and practical method that jointly learns emphproper canonical losses and probabilities for multiclass problems.
arXiv Detail & Related papers (2023-01-27T13:10:45Z) - Parameter-free Regret in High Probability with Heavy Tails [45.11667443216861]
We present new algorithms for online convex optimization over unbounded domains.
We obtain parameter-free regret in high-probability given access only to potentially heavy-tailed subgradient estimates.
arXiv Detail & Related papers (2022-10-25T21:43:44Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - High-probability Bounds for Non-Convex Stochastic Optimization with
Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails.
We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z) - Localization, Convexity, and Star Aggregation [0.0]
Offset Rademacher complexities have been shown to imply sharp, linear-dependent upper bounds for the square loss.
We show that in the statistical setting, the offset bound can be generalized to any loss satisfying certain uniform convexity.
arXiv Detail & Related papers (2021-05-19T00:47:59Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.