Related papers: Stochastic Bandits with Vector Losses: Minimizing $\ell^\infty$-Norm of Relative Losses

Stochastic Bandits with Vector Losses: Minimizing $\ell^\infty$-Norm of Relative Losses

URL: http://arxiv.org/abs/2010.08061v1
Date: Thu, 15 Oct 2020 23:03:35 GMT
Title: Stochastic Bandits with Vector Losses: Minimizing $\ell^\infty$-Norm of Relative Losses
Authors: Xuedong Shang, Han Shao, Jian Qian
Abstract summary: Multi-armed bandits are widely applied in recommender systems, for which the goal is to maximize the click rate. In this paper, we model this situation as a problem of $K$-armed bandit with multiple losses.
Score: 21.085722900446257
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-armed bandits are widely applied in scenarios like recommender systems, for which the goal is to maximize the click rate. However, more factors should be considered, e.g., user stickiness, user growth rate, user experience assessment, etc. In this paper, we model this situation as a problem of $K$-armed bandit with multiple losses. We define relative loss vector of an arm where the $i$-th entry compares the arm and the optimal arm with respect to the $i$-th loss. We study two goals: (a) finding the arm with the minimum $\ell^\infty$-norm of relative losses with a given confidence level (which refers to fixed-confidence best-arm identification); (b) minimizing the $\ell^\infty$-norm of cumulative relative losses (which refers to regret minimization). For goal (a), we derive a problem-dependent sample complexity lower bound and discuss how to achieve matching algorithms. For goal (b), we provide a regret lower bound of $\Omega(T^{2/3})$ and provide a matching algorithm.

Related papers

Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM) We study a model within this domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary. We propose an algorithm namely robust contextual dueling bandits (RCDB), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z)
Best-of-Both-Worlds Linear Contextual Bandits [45.378265414553226]
This study investigates the problem of $K$-armed linear contextual bandits, an instance of the multi-armed bandit problem, under an adversarial corruption. We develop a strategy that is effective in both adversarial environments, with theoretical guarantees. We refer to our strategy as the Best-of-Both-Worlds (BoBW) RealFTRL, due to its theoretical guarantees in both adversarial regimes.
arXiv Detail & Related papers (2023-12-27T09:32:18Z)
Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM) We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$. Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z)
Contextual Combinatorial Bandits with Probabilistically Triggered Arms [55.9237004478033]
We study contextual bandits with probabilistically triggered arms (C$2$MAB-T) under a variety of smoothness conditions. Under the triggering modulated (TPM) condition, we devise the C$2$-UC-T algorithm and derive a regret bound $tildeO(dsqrtT)$.
arXiv Detail & Related papers (2023-03-30T02:51:00Z)
Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits [8.508198765617198]
We propose a novel contextual bandit algorithm for generalized linear rewards with an $tildeO(sqrtkappa-1 phi T)$ regret over $T$ rounds. We also provide an $O(kappa-1 phi log (NT) log T)$ regret bound for $N$ arms under a probabilistic margin condition.
arXiv Detail & Related papers (2022-09-15T00:20:38Z)
Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs [7.125769932993104]
We consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion. At each round, contexts are revealed for each arm, and the decision maker chooses one arm to pull and receives the corresponding reward. We apply the Thompson Sampling algorithm for the disjoint model, and provide a comprehensive regret analysis for a variant of the proposed algorithm.
arXiv Detail & Related papers (2022-06-24T18:48:35Z)
Linear Contextual Bandits with Adversarial Corruptions [91.38793800392108]
We study the linear contextual bandit problem in the presence of adversarial corruption. We present a variance-aware algorithm that is adaptive to the level of adversarial contamination $C$.
arXiv Detail & Related papers (2021-10-25T02:53:24Z)
Scale Free Adversarial Multi Armed Bandits [13.757095663704858]
We consider the Scale-Free Adversarial Multi Armed Bandit(MAB) problem. We design a Follow The Regularized Leader(FTRL) algorithm, which comes with the first scale-free regret guarantee for MAB. We also develop a new technique for obtaining local-norm lower bounds for Bregman Divergences.
arXiv Detail & Related papers (2021-06-08T21:26:57Z)
Bandits with many optimal arms [68.17472536610859]
We write $p*$ for the proportion of optimal arms and $Delta$ for the minimal mean-gap between optimal and sub-optimal arms. We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting.
arXiv Detail & Related papers (2021-03-23T11:02:31Z)
Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy [71.17938026619068]
We study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous. We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weighting strategy. We show that our algorithm has a regret guarantee of $O(ksqrt(A-k+1)T log (|mathcalF|T))$.
arXiv Detail & Related papers (2021-02-15T19:10:52Z)
Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints. For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z)
An Optimal Elimination Algorithm for Learning a Best Arm [37.18327505953403]
We consider the classic problem of $(epsilon,delta)$-PAC learning a best arm. We propose a new approach for $(epsilon,delta)$-PAC learning a best arm.
arXiv Detail & Related papers (2020-06-20T20:21:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.