Related papers: Adversarial Contextual Bandits Go Kernelized

Adversarial Contextual Bandits Go Kernelized

URL: http://arxiv.org/abs/2310.01609v1
Date: Mon, 2 Oct 2023 19:59:39 GMT
Title: Adversarial Contextual Bandits Go Kernelized
Authors: Gergely Neu, Julia Olkhovskaya, Sattar Vakili
Abstract summary: We study a generalization of the problem of online learning in adversarial linear contextual bandits by incorporating loss functions that belong to a Hilbert kernel space. We propose a new optimistically biased estimator for the loss functions and reproducing near-optimal regret guarantees.
Score: 21.007410990554522
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We study a generalization of the problem of online learning in adversarial linear contextual bandits by incorporating loss functions that belong to a reproducing kernel Hilbert space, which allows for a more flexible modeling of complex decision-making scenarios. We propose a computationally efficient algorithm that makes use of a new optimistically biased estimator for the loss functions and achieves near-optimal regret guarantees under a variety of eigenvalue decay assumptions made on the underlying kernel. Specifically, under the assumption of polynomial eigendecay with exponent $c>1$, the regret is $\widetilde{O}(KT^{\frac{1}{2}(1+\frac{1}{c})})$, where $T$ denotes the number of rounds and $K$ the number of actions. Furthermore, when the eigendecay follows an exponential pattern, we achieve an even tighter regret bound of $\widetilde{O}(\sqrt{T})$. These rates match the lower bounds in all special cases where lower bounds are known at all, and match the best known upper bounds available for the more well-studied stochastic counterpart of our problem.

Related papers

Differential Privacy in Kernelized Contextual Bandits via Random Projections [8.658538065693206]
We consider the problem of contextual kernel bandits with contexts.<n>The underlying reward function belongs to a known Reproducing Kernel Hilbert Space.<n>We propose a novel algorithm that achieves the state-of-the-art cumulative regret of $widetildemathcalO(sqrtgamma_TT+fracgamma_Tvarepsilon_mathrmDP)$
arXiv Detail & Related papers (2025-07-18T03:54:49Z)
Dynamic Regret Reduces to Kernelized Static Regret [63.36965242404415]
We study dynamic regret in online convex optimization, where the objective is to achieve low cumulative loss relative to an arbitrary benchmark sequence.<n>By constructing a suitable function space in the form of a Reproducing Kernel Hilbert Space (RKHS), our reduction enables us to recover the optimal $R_T(u_1,ldots,u_T) = mathcalO(sqrtsum_t|u_t-u_t-u_t-1|T)$ dynamic regret guarantee.
arXiv Detail & Related papers (2025-07-07T21:09:33Z)
Multiclass Loss Geometry Matters for Generalization of Gradient Descent in Separable Classification [27.33243506775655]
We study generalization performance of unregularized methods for separable linear classification.<n>We show that convergence rates are crucially influenced by the geometry of the loss template.
arXiv Detail & Related papers (2025-05-28T13:39:14Z)
Beyond Worst-Case Online Classification: VC-Based Regret Bounds for Relaxed Benchmarks [19.642496463491053]
We revisit online binary classification by shifting the focus from competing with the best-in-class binary loss to competing against relaxed benchmarks. We consider comparing predictors that are robust to small input perturbations, perform well under Gaussian smoothing, or maintain a prescribed output margin. Our algorithms achieve regret guarantees that depend only on the VC dimension and the complexity of the instance space.
arXiv Detail & Related papers (2025-04-14T18:00:23Z)
Contextual Online Decision Making with Infinite-Dimensional Functional Regression [19.06054415343443]
Contextual sequential decision-making problems play a crucial role in machine learning. We provide a universal admissible algorithm framework for dealing with all kinds of contextual online decision-making problems.
arXiv Detail & Related papers (2025-01-30T14:05:20Z)
An Equivalence Between Static and Dynamic Regret Minimization [10.812831455376218]
We show that for linear losses, dynamic regret minimization is equivalent to static regret minimization in an extended decision space. We provide an algorithm guaranteeing dynamic regret of the form $R_T(u_1,dots,u_T)le tilde.
arXiv Detail & Related papers (2024-06-03T17:54:58Z)
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel. An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z)
Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems [61.85150061213987]
We study the generalized low-rank matrix bandit problem, proposed in citelu2021low under the Generalized Linear Model (GLM) framework. To overcome the computational infeasibility and theoretical restrain of existing algorithms, we first propose the G-ESTT framework. We show that G-ESTT can achieve the $tildeO(sqrt(d_1+d_2)3/2Mr3/2T)$ bound of regret while G-ESTS can achineve the $tildeO
arXiv Detail & Related papers (2024-01-14T14:14:19Z)
Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback [30.23951525723659]
We study the low-rank MDPs with adversarially changed losses in the full-information feedback setting. We propose a policy optimization-based algorithm POLO, and we prove that it attains the $widetildeO(Kfrac56Afrac12dln (1+M)/ (1-gamma)2)$ regret guarantee.
arXiv Detail & Related papers (2023-11-14T03:12:43Z)
Refined Regret for Adversarial MDPs with Linear Function Approximation [50.00022394876222]
We consider learning in an adversarial Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes. This paper provides two algorithms that improve the regret to $tildemathcal O(K2/3)$ in the same setting.
arXiv Detail & Related papers (2023-01-30T14:37:21Z)
LegendreTron: Uprising Proper Multiclass Loss Learning [22.567234503869845]
Loss functions serve as the foundation of supervised learning and are often chosen prior to model development. Recent works have sought to emphlearn losses and models jointly. We present sc LegendreTron as a novel and practical method that jointly learns emphproper canonical losses and probabilities for multiclass problems.
arXiv Detail & Related papers (2023-01-27T13:10:45Z)
Parameter-free Regret in High Probability with Heavy Tails [45.11667443216861]
We present new algorithms for online convex optimization over unbounded domains. We obtain parameter-free regret in high-probability given access only to potentially heavy-tailed subgradient estimates.
arXiv Detail & Related papers (2022-10-25T21:43:44Z)
Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process. We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator. We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z)
High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails. We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z)
Localization, Convexity, and Star Aggregation [0.0]
Offset Rademacher complexities have been shown to imply sharp, linear-dependent upper bounds for the square loss. We show that in the statistical setting, the offset bound can be generalized to any loss satisfying certain uniform convexity.
arXiv Detail & Related papers (2021-05-19T00:47:59Z)
Online nonparametric regression with Sobolev kernels [99.12817345416846]
We derive the regret upper bounds on the classes of Sobolev spaces $W_pbeta(mathcalX)$, $pgeq 2, beta>fracdp$. The upper bounds are supported by the minimax regret analysis, which reveals that in the cases $beta> fracd2$ or $p=infty$ these rates are (essentially) optimal.
arXiv Detail & Related papers (2021-02-06T15:05:14Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.