Related papers: Regret Minimization in Heavy-Tailed Bandits

Regret Minimization in Heavy-Tailed Bandits

URL: http://arxiv.org/abs/2102.03734v1
Date: Sun, 7 Feb 2021 07:46:24 GMT
Title: Regret Minimization in Heavy-Tailed Bandits
Authors: Shubhada Agrawal, Sandeep Juneja, Wouter M. Koolen
Abstract summary: We revisit the classic regret-minimization problem in the multi-armed bandit setting. We propose an optimal algorithm that matches the lower bound exactly in the first-order term. We show that our index concentrates faster than the well known truncated or trimmed empirical mean estimators.
Score: 12.272975892517039
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We revisit the classic regret-minimization problem in the stochastic multi-armed bandit setting when the arm-distributions are allowed to be heavy-tailed. Regret minimization has been well studied in simpler settings of either bounded support reward distributions or distributions that belong to a single parameter exponential family. We work under the much weaker assumption that the moments of order $(1+\epsilon)$ are uniformly bounded by a known constant B, for some given $\epsilon > 0$. We propose an optimal algorithm that matches the lower bound exactly in the first-order term. We also give a finite-time bound on its regret. We show that our index concentrates faster than the well known truncated or trimmed empirical mean estimators for the mean of heavy-tailed distributions. Computing our index can be computationally demanding. To address this, we develop a batch-based algorithm that is optimal up to a multiplicative constant depending on the batch size. We hence provide a controlled trade-off between statistical optimality and computational cost.

Related papers

Continuous K-Max Bandits [54.21533414838677]
We study the $K$-Max multi-armed bandits problem with continuous outcome distributions and weak value-index feedback. This setting captures critical applications in recommendation systems, distributed computing, server scheduling, etc. Our key contribution is the computationally efficient algorithm DCK-UCB, which combines adaptive discretization with bias-corrected confidence bounds.
arXiv Detail & Related papers (2025-02-19T06:37:37Z)
Tangential Randomization in Linear Bandits (TRAiL): Guaranteed Inference and Regret Bounds [1.03590082373586]
We propose and analyze TRAiL, a regret-optimal forced exploration algorithm for linear bandits. TraiL ensures a $Omega(sqrtT)$ growth in the inference quality, measured via the minimum eigenvalue of the design (regressor) matrix. We characterize an $Omega(sqrtT)$ minimax lower bound for any algorithm on the expected regret.
arXiv Detail & Related papers (2024-11-19T01:08:13Z)
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel. An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z)
$(\epsilon, u)$-Adaptive Regret Minimization in Heavy-Tailed Bandits [29.966828248335972]
We study the regret minimization problem when $epsilon$ and $u$ are unknown to the learner. AdaR-UCB is the first algorithm that enjoys regret guarantees nearly matching those of the non-adaptive heavy-tailed case.
arXiv Detail & Related papers (2023-10-04T17:11:15Z)
Revisiting Weighted Strategy for Non-stationary Parametric Bandits [82.1942459195896]
This paper revisits the weighted strategy for non-stationary parametric bandits. We propose a refined analysis framework, which produces a simpler weight-based algorithm. Our new framework can be used to improve regret bounds of other parametric bandits.
arXiv Detail & Related papers (2023-03-05T15:11:14Z)
Optimal Algorithms for Stochastic Complementary Composite Minimization [55.26935605535377]
Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization. We provide novel excess risk bounds, both in expectation and with high probability. Our algorithms are nearly optimal, which we prove via novel lower complexity bounds for this class of problems.
arXiv Detail & Related papers (2022-11-03T12:40:24Z)
Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits [88.21288104408556]
We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family. We propose a Thompson sampling, termed Expulli, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm.
arXiv Detail & Related papers (2022-06-07T18:08:21Z)
The Fragility of Optimized Bandit Algorithms [4.390757904176221]
Much of the literature on optimal design of bandit algorithms is based on fragility of expected regret. We show that optimized UCB bandit designs are fragile when the problem is even slightly mis-specified. We show that UCB algorithms can be modified so as to ensure a desired degree of robustness to mis-specification.
arXiv Detail & Related papers (2021-09-28T10:11:06Z)
Lenient Regret for Multi-Armed Bandits [72.56064196252498]
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and observes rewards for the actions it took. While the majority of algorithms try to minimize the regret, i.e., the cumulative difference between the reward of the best action and the agent's action, this criterion might lead to undesirable results. We suggest a new, more lenient, regret criterion that ignores suboptimality gaps smaller than some $epsilon$.
arXiv Detail & Related papers (2020-08-10T08:30:52Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
Constrained regret minimization for multi-criterion multi-armed bandits [5.349852254138086]
We study the problem of regret minimization over a given time horizon, subject to a risk constraint. We propose a Risk-Constrained Lower Confidence Bound algorithm that guarantees logarithmic regret. We prove lower bounds on the performance of any risk-constrained regret minimization algorithm.
arXiv Detail & Related papers (2020-06-17T04:23:18Z)
Regret and Belief Complexity Trade-off in Gaussian Process Bandits via Information Thresholding [42.669970064867556]
We show how to characterize the trade-off between regret bounds of GP bandit algorithms and complexity of the posterior distributions. We observe state of the art accuracy and complexity trade-offs for GP bandit algorithms applied to global optimization.
arXiv Detail & Related papers (2020-03-23T21:05:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.