Non-Stationary Lipschitz Bandits
- URL: http://arxiv.org/abs/2505.18871v1
- Date: Sat, 24 May 2025 21:23:19 GMT
- Title: Non-Stationary Lipschitz Bandits
- Authors: Nicolas Nguyen, Solenne Gaucher, Claire Vernade,
- Abstract summary: We study the problem of non-stationary Lipschitz bandits, where the number of actions is infinite and the reward function, satisfying a Lipschitz assumption, can change arbitrarily over time.<n>We design an algorithm that adaptively tracks the recently introduced notion of significant shifts, defined by large deviations of the cumulative reward function.
- Score: 10.073375215995611
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problem of non-stationary Lipschitz bandits, where the number of actions is infinite and the reward function, satisfying a Lipschitz assumption, can change arbitrarily over time. We design an algorithm that adaptively tracks the recently introduced notion of significant shifts, defined by large deviations of the cumulative reward function. To detect such reward changes, our algorithm leverages a hierarchical discretization of the action space. Without requiring any prior knowledge of the non-stationarity, our algorithm achieves a minimax-optimal dynamic regret bound of $\mathcal{\widetilde{O}}(\tilde{L}^{1/3}T^{2/3})$, where $\tilde{L}$ is the number of significant shifts and $T$ the horizon. This result provides the first optimal guarantee in this setting.
Related papers
- Quantum Lipschitz Bandits [6.167074802065416]
Lipschitz bandit is a key variant of bandit problems where the expected reward function satisfies a Lipschitz condition.<n>We introduce the first quantum Lipschitz bandit algorithms to address the challenges of continuous action spaces and non-linear reward functions.
arXiv Detail & Related papers (2025-04-03T03:39:04Z) - Linear Bandits with Partially Observable Features [35.08645010112184]
We introduce a novel linear bandit problem with partially observable features, resulting in partial reward information and spurious estimates.<n>Without proper address for latent part, regret possibly grows linearly in decision horizon $T$, as their influence on rewards are unknown.<n>We propose a novel analysis to handle the latent features and an algorithm that achieves sublinear regret.
arXiv Detail & Related papers (2025-02-10T04:15:38Z) - Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds.
We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$.
Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Revisiting Weighted Strategy for Non-stationary Parametric Bandits [82.1942459195896]
This paper revisits the weighted strategy for non-stationary parametric bandits.
We propose a refined analysis framework, which produces a simpler weight-based algorithm.
Our new framework can be used to improve regret bounds of other parametric bandits.
arXiv Detail & Related papers (2023-03-05T15:11:14Z) - Deterministic Nonsmooth Nonconvex Optimization [82.39694252205011]
We show that randomization is necessary to obtain a dimension-free dimension-free algorithm.<n>Our algorithm yields the first deterministic dimension-free algorithm for optimizing ReLU networks.
arXiv Detail & Related papers (2023-02-16T13:57:19Z) - Refined Regret for Adversarial MDPs with Linear Function Approximation [50.00022394876222]
We consider learning in an adversarial Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes.
This paper provides two algorithms that improve the regret to $tildemathcal O(K2/3)$ in the same setting.
arXiv Detail & Related papers (2023-01-30T14:37:21Z) - Smooth Non-Stationary Bandits [49.19728527803684]
We study a non-stationary bandits problem where each arm's mean reward sequence can be embedded into a $beta$-H"older function.
We show the first separation between the smooth (i.e., $betage 2$) and non-smooth (i.e., $beta=1$) regimes by presenting a policy with $tilde O(k4/5 T3/5)$ regret on any $k$-armed, $2$-H"older instance.
arXiv Detail & Related papers (2023-01-29T06:03:20Z) - Efficient Minimax Optimal Global Optimization of Lipschitz Continuous
Multivariate Functions [0.0]
We show that our algorithm achieves an average regretO(LstnT-frac1n)$ for the horizon for the Lipschitz continuous functions.
arXiv Detail & Related papers (2022-06-06T06:28:38Z) - Cumulative Regret Analysis of the Piyavskii--Shubert Algorithm and Its
Variants for Global Optimization [0.0]
We study the problem of global optimization, where we analyze the performance of the Piyavskii--Shubert algorithm and its variants.
We show that, our algorithm efficiently determines its queries; and achieves nearly minimax optimal (up to log factors) cumulative regret.
arXiv Detail & Related papers (2021-08-24T17:36:33Z) - Randomized Exploration for Reinforcement Learning with General Value
Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm.
Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises.
We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z) - Smooth Bandit Optimization: Generalization to H\"older Space [37.15553727896912]
We consider bandit optimization of a smooth reward function, where the goal is cumulative regret.
Our main result is in generalization of the reward function to H"older space with exponent $alpha>1$.
We show that it achieves regret rate that matches the existing lower bound for adaptation within the $alphaleq 1$ subset.
arXiv Detail & Related papers (2020-12-11T01:43:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.