Regret Minimization in Heavy-Tailed Bandits
- URL: http://arxiv.org/abs/2102.03734v1
- Date: Sun, 7 Feb 2021 07:46:24 GMT
- Title: Regret Minimization in Heavy-Tailed Bandits
- Authors: Shubhada Agrawal, Sandeep Juneja, Wouter M. Koolen
- Abstract summary: We revisit the classic regret-minimization problem in the multi-armed bandit setting.
We propose an optimal algorithm that matches the lower bound exactly in the first-order term.
We show that our index concentrates faster than the well known truncated or trimmed empirical mean estimators.
- Score: 12.272975892517039
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We revisit the classic regret-minimization problem in the stochastic
multi-armed bandit setting when the arm-distributions are allowed to be
heavy-tailed. Regret minimization has been well studied in simpler settings of
either bounded support reward distributions or distributions that belong to a
single parameter exponential family. We work under the much weaker assumption
that the moments of order $(1+\epsilon)$ are uniformly bounded by a known
constant B, for some given $\epsilon > 0$. We propose an optimal algorithm that
matches the lower bound exactly in the first-order term. We also give a
finite-time bound on its regret. We show that our index concentrates faster
than the well known truncated or trimmed empirical mean estimators for the mean
of heavy-tailed distributions. Computing our index can be computationally
demanding. To address this, we develop a batch-based algorithm that is optimal
up to a multiplicative constant depending on the batch size. We hence provide a
controlled trade-off between statistical optimality and computational cost.
Related papers
- Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - $(\epsilon, u)$-Adaptive Regret Minimization in Heavy-Tailed Bandits [29.966828248335972]
We study the regret minimization problem when $epsilon$ and $u$ are unknown to the learner.
AdaR-UCB is the first algorithm that enjoys regret guarantees nearly matching those of the non-adaptive heavy-tailed case.
arXiv Detail & Related papers (2023-10-04T17:11:15Z) - Revisiting Weighted Strategy for Non-stationary Parametric Bandits [82.1942459195896]
This paper revisits the weighted strategy for non-stationary parametric bandits.
We propose a refined analysis framework, which produces a simpler weight-based algorithm.
Our new framework can be used to improve regret bounds of other parametric bandits.
arXiv Detail & Related papers (2023-03-05T15:11:14Z) - Optimal Algorithms for Stochastic Complementary Composite Minimization [55.26935605535377]
Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization.
We provide novel excess risk bounds, both in expectation and with high probability.
Our algorithms are nearly optimal, which we prove via novel lower complexity bounds for this class of problems.
arXiv Detail & Related papers (2022-11-03T12:40:24Z) - Finite-Time Regret of Thompson Sampling Algorithms for Exponential
Family Multi-Armed Bandits [88.21288104408556]
We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family.
We propose a Thompson sampling, termed Expulli, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm.
arXiv Detail & Related papers (2022-06-07T18:08:21Z) - The Fragility of Optimized Bandit Algorithms [4.390757904176221]
Much of the literature on optimal design of bandit algorithms is based on fragility of expected regret.
We show that optimized UCB bandit designs are fragile when the problem is even slightly mis-specified.
We show that UCB algorithms can be modified so as to ensure a desired degree of robustness to mis-specification.
arXiv Detail & Related papers (2021-09-28T10:11:06Z) - Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed
Rewards [24.983866845065926]
We consider multi-armed bandits with heavy-tailed rewards, whose $p$-th moment is bounded by a constant $nu_p$ for $1pleq2$.
We propose a novel robust estimator which does not require $nu_p$ as prior information.
We show that an error probability of the proposed estimator decays exponentially fast.
arXiv Detail & Related papers (2020-10-24T10:44:02Z) - Lenient Regret for Multi-Armed Bandits [72.56064196252498]
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and observes rewards for the actions it took.
While the majority of algorithms try to minimize the regret, i.e., the cumulative difference between the reward of the best action and the agent's action, this criterion might lead to undesirable results.
We suggest a new, more lenient, regret criterion that ignores suboptimality gaps smaller than some $epsilon$.
arXiv Detail & Related papers (2020-08-10T08:30:52Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z) - Constrained regret minimization for multi-criterion multi-armed bandits [5.349852254138086]
We study the problem of regret minimization over a given time horizon, subject to a risk constraint.
We propose a Risk-Constrained Lower Confidence Bound algorithm that guarantees logarithmic regret.
We prove lower bounds on the performance of any risk-constrained regret minimization algorithm.
arXiv Detail & Related papers (2020-06-17T04:23:18Z) - Regret and Belief Complexity Trade-off in Gaussian Process Bandits via
Information Thresholding [42.669970064867556]
We show how to characterize the trade-off between regret bounds of GP bandit algorithms and complexity of the posterior distributions.
We observe state of the art accuracy and complexity trade-offs for GP bandit algorithms applied to global optimization.
arXiv Detail & Related papers (2020-03-23T21:05:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.