Stochastic Bandits with Vector Losses: Minimizing $\ell^\infty$-Norm of
Relative Losses
- URL: http://arxiv.org/abs/2010.08061v1
- Date: Thu, 15 Oct 2020 23:03:35 GMT
- Title: Stochastic Bandits with Vector Losses: Minimizing $\ell^\infty$-Norm of
Relative Losses
- Authors: Xuedong Shang, Han Shao, Jian Qian
- Abstract summary: Multi-armed bandits are widely applied in recommender systems, for which the goal is to maximize the click rate.
In this paper, we model this situation as a problem of $K$-armed bandit with multiple losses.
- Score: 21.085722900446257
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-armed bandits are widely applied in scenarios like recommender systems,
for which the goal is to maximize the click rate. However, more factors should
be considered, e.g., user stickiness, user growth rate, user experience
assessment, etc. In this paper, we model this situation as a problem of
$K$-armed bandit with multiple losses. We define relative loss vector of an arm
where the $i$-th entry compares the arm and the optimal arm with respect to the
$i$-th loss. We study two goals: (a) finding the arm with the minimum
$\ell^\infty$-norm of relative losses with a given confidence level (which
refers to fixed-confidence best-arm identification); (b) minimizing the
$\ell^\infty$-norm of cumulative relative losses (which refers to regret
minimization). For goal (a), we derive a problem-dependent sample complexity
lower bound and discuss how to achieve matching algorithms. For goal (b), we
provide a regret lower bound of $\Omega(T^{2/3})$ and provide a matching
algorithm.
Related papers
- Best-of-Both-Worlds Linear Contextual Bandits [45.378265414553226]
This study investigates the problem of $K$-armed linear contextual bandits, an instance of the multi-armed bandit problem, under an adversarial corruption.
We develop a strategy that is effective in both adversarial environments, with theoretical guarantees.
We refer to our strategy as the Best-of-Both-Worlds (BoBW) RealFTRL, due to its theoretical guarantees in both adversarial regimes.
arXiv Detail & Related papers (2023-12-27T09:32:18Z) - Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - Double Doubly Robust Thompson Sampling for Generalized Linear Contextual
Bandits [8.508198765617198]
We propose a novel contextual bandit algorithm for generalized linear rewards with an $tildeO(sqrtkappa-1 phi T)$ regret over $T$ rounds.
We also provide an $O(kappa-1 phi log (NT) log T)$ regret bound for $N$ arms under a probabilistic margin condition.
arXiv Detail & Related papers (2022-09-15T00:20:38Z) - Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs [7.125769932993104]
We consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion.
At each round, contexts are revealed for each arm, and the decision maker chooses one arm to pull and receives the corresponding reward.
We apply the Thompson Sampling algorithm for the disjoint model, and provide a comprehensive regret analysis for a variant of the proposed algorithm.
arXiv Detail & Related papers (2022-06-24T18:48:35Z) - Linear Contextual Bandits with Adversarial Corruptions [91.38793800392108]
We study the linear contextual bandit problem in the presence of adversarial corruption.
We present a variance-aware algorithm that is adaptive to the level of adversarial contamination $C$.
arXiv Detail & Related papers (2021-10-25T02:53:24Z) - Scale Free Adversarial Multi Armed Bandits [13.757095663704858]
We consider the Scale-Free Adversarial Multi Armed Bandit(MAB) problem.
We design a Follow The Regularized Leader(FTRL) algorithm, which comes with the first scale-free regret guarantee for MAB.
We also develop a new technique for obtaining local-norm lower bounds for Bregman Divergences.
arXiv Detail & Related papers (2021-06-08T21:26:57Z) - Bandits with many optimal arms [68.17472536610859]
We write $p*$ for the proportion of optimal arms and $Delta$ for the minimal mean-gap between optimal and sub-optimal arms.
We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting.
arXiv Detail & Related papers (2021-03-23T11:02:31Z) - Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy [71.17938026619068]
We study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous.
We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weighting strategy.
We show that our algorithm has a regret guarantee of $O(ksqrt(A-k+1)T log (|mathcalF|T))$.
arXiv Detail & Related papers (2021-02-15T19:10:52Z) - Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits
with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints.
For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z) - An Optimal Elimination Algorithm for Learning a Best Arm [37.18327505953403]
We consider the classic problem of $(epsilon,delta)$-PAC learning a best arm.
We propose a new approach for $(epsilon,delta)$-PAC learning a best arm.
arXiv Detail & Related papers (2020-06-20T20:21:33Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.