Related papers: The Sliding Regret in Stochastic Bandits: Discriminating Index and Randomized Policies

The Sliding Regret in Stochastic Bandits: Discriminating Index and Randomized Policies

URL: http://arxiv.org/abs/2311.18437v1
Date: Thu, 30 Nov 2023 10:37:03 GMT
Title: The Sliding Regret in Stochastic Bandits: Discriminating Index and Randomized Policies
Authors: Victor Boone
Abstract summary: We study the one-shot behavior of no-regret algorithms for bandits. We introduce a new notion: the sliding regret, that measures the worst pseudo-regret over a time-window of fixed length sliding to infinity.
Score: 0.8158530638728501
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies the one-shot behavior of no-regret algorithms for stochastic bandits. Although many algorithms are known to be asymptotically optimal with respect to the expected regret, over a single run, their pseudo-regret seems to follow one of two tendencies: it is either smooth or bumpy. To measure this tendency, we introduce a new notion: the sliding regret, that measures the worst pseudo-regret over a time-window of fixed length sliding to infinity. We show that randomized methods (e.g. Thompson Sampling and MED) have optimal sliding regret, while index policies, although possibly asymptotically optimal for the expected regret, have the worst possible sliding regret under regularity conditions on their index (e.g. UCB, UCB-V, KL-UCB, MOSS, IMED etc.). We further analyze the average bumpiness of the pseudo-regret of index policies via the regret of exploration, that we show to be suboptimal as well.

Related papers

Thompson Sampling for Infinite-Horizon Discounted Decision Processes [0.0]
We study the behavior of a sampling-based algorithm, called Thompson sampling. By decomposing the standard (expected) regret, we develop a new metric, called the expected residual regret.
arXiv Detail & Related papers (2024-05-14T01:01:05Z)
Minimax Optimal Quantile and Semi-Adversarial Regret via Root-Logarithmic Regularizers [31.102181563065844]
Quantile (and, more generally, KL) regret bounds relax the goal of competing against the best individual expert to only competing against a majority of experts on adversarial data. More recently, the semi-adversarial paradigm (Bilodeau, Negrea, and Roy 2020) provides an alternative relaxation of adversarial online learning by considering data that may be neither fully adversarial nor adversarial. We achieve the minimax optimal regret in both paradigms using FTRL with separate, novel, root-logarithmic regularizers, both of which can be interpreted as yielding variants of NormalHedge.
arXiv Detail & Related papers (2021-10-27T22:38:52Z)
Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning [50.44564503645015]
We provide improved gap-dependent regret bounds for reinforcement learning in finite episodic Markov decision processes. We prove tighter upper regret bounds for optimistic algorithms and accompany them with new information-theoretic lower bounds for a large class of MDPs.
arXiv Detail & Related papers (2021-07-02T20:36:05Z)
High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails. We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z)
ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms. We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients. We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z)
Lenient Regret for Multi-Armed Bandits [72.56064196252498]
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and observes rewards for the actions it took. While the majority of algorithms try to minimize the regret, i.e., the cumulative difference between the reward of the best action and the agent's action, this criterion might lead to undesirable results. We suggest a new, more lenient, regret criterion that ignores suboptimality gaps smaller than some $epsilon$.
arXiv Detail & Related papers (2020-08-10T08:30:52Z)
Adaptive Discretization for Adversarial Lipschitz Bandits [85.39106976861702]
Lipschitz bandits is a prominent version of multi-armed bandits that studies large, structured action spaces. A central theme here is the adaptive discretization of the action space, which gradually zooms in'' on the more promising regions. We provide the first algorithm for adaptive discretization in the adversarial version, and derive instance-dependent regret bounds.
arXiv Detail & Related papers (2020-06-22T16:06:25Z)
Crush Optimism with Pessimism: Structured Bandits Beyond Asymptotic Optimality [25.627939043735683]
We study structured bandits for minimizing regret. In this paper we focus on the finite hypothesis and ask if one can achieve the optimality while enjoying bounded regret.
arXiv Detail & Related papers (2020-06-15T20:46:52Z)
A General Theory of the Stochastic Linear Bandit and Its Applications [8.071506311915398]
We introduce a general analysis framework and a family of algorithms for the linear bandit problem. Our new notion of optimism in expectation gives rise to a new algorithm, called sieved greedy (SG) that reduces the overexploration problem in OFUL. In addition to proving that SG is theoretically rate optimal, our empirical simulations show that SG outperforms existing benchmarks such as greedy, OFUL, and TS.
arXiv Detail & Related papers (2020-02-12T18:54:41Z)
Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB. We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits. Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.