Scale Free Adversarial Multi Armed Bandits
- URL: http://arxiv.org/abs/2106.04700v1
- Date: Tue, 8 Jun 2021 21:26:57 GMT
- Title: Scale Free Adversarial Multi Armed Bandits
- Authors: Sudeep Raja Putta, Shipra Agrawal
- Abstract summary: We consider the Scale-Free Adversarial Multi Armed Bandit(MAB) problem.
We design a Follow The Regularized Leader(FTRL) algorithm, which comes with the first scale-free regret guarantee for MAB.
We also develop a new technique for obtaining local-norm lower bounds for Bregman Divergences.
- Score: 13.757095663704858
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the Scale-Free Adversarial Multi Armed Bandit(MAB) problem, where
the player only knows the number of arms $n$ and not the scale or magnitude of
the losses. It sees bandit feedback about the loss vectors $l_1,\dots, l_T \in
\mathbb{R}^n$. The goal is to bound its regret as a function of $n$ and
$l_1,\dots, l_T$. We design a Follow The Regularized Leader(FTRL) algorithm,
which comes with the first scale-free regret guarantee for MAB. It uses the log
barrier regularizer, the importance weighted estimator, an adaptive learning
rate, and an adaptive exploration parameter. In the analysis, we introduce a
simple, unifying technique for obtaining regret inequalities for FTRL and
Online Mirror Descent(OMD) on the probability simplex using Potential Functions
and Mixed Bregmans. We also develop a new technique for obtaining local-norm
lower bounds for Bregman Divergences, which are crucial in bandit regret
bounds. These tools could be of independent interest.
Related papers
- Learning for Bandits under Action Erasures [20.235500642661812]
We consider a novel multi-arm bandit (MAB) setup, where a learner needs to communicate the actions to distributed agents over erasure channels.
In our model, while the distributed agents know if an action is erased, the central learner does not.
We propose a scheme that can work on top of any (existing or future) MAB algorithm and make it robust to action erasures.
arXiv Detail & Related papers (2024-06-26T05:03:00Z) - Linear bandits with polylogarithmic minimax regret [8.97780713904412]
We study a noise model for linear bandits for which the subgaussian noise parameter vanishes linearly as we select actions on the unit sphere closer and closer to the unknown vector.
We introduce an algorithm for this problem that exhibits a minimax regret scaling as $log3(T)$ in the time horizon $T$, in stark contrast the square root scaling of this regret for typical bandit algorithms.
arXiv Detail & Related papers (2024-02-19T10:56:47Z) - Fixed-Budget Best-Arm Identification in Sparse Linear Bandits [69.6194614504832]
We study the best-arm identification problem in sparse linear bandits under the fixed-budget setting.
We design a two-phase algorithm, Lasso and Optimal-Design- (Lasso-OD) based linear best-arm identification.
We show that Lasso-OD is almost minimax optimal in the exponent.
arXiv Detail & Related papers (2023-11-01T12:32:17Z) - Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - Revisiting Weighted Strategy for Non-stationary Parametric Bandits [82.1942459195896]
This paper revisits the weighted strategy for non-stationary parametric bandits.
We propose a refined analysis framework, which produces a simpler weight-based algorithm.
Our new framework can be used to improve regret bounds of other parametric bandits.
arXiv Detail & Related papers (2023-03-05T15:11:14Z) - Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with
Probabilistically Triggered Arms or Independent Arms [53.89752069984382]
We study the semi-bandits (CMAB) and focus on reducing the dependency of the batch-size $K$ in the regret bound.
First, for the setting of CMAB with probabilistically triggered arms (CMAB-T), we propose a BCUCB-T algorithm with variance-aware confidence intervals.
Second, for the setting of non-triggering CMAB with independent arms, we propose a SESCB algorithm which leverages on the non-triggering version of the TPVM condition.
arXiv Detail & Related papers (2022-08-31T13:09:39Z) - Minimax-Optimal Multi-Agent RL in Zero-Sum Markov Games With a
Generative Model [50.38446482252857]
Two-player zero-sum Markov games are arguably the most basic setting in multi-agent reinforcement learning.
We develop a learning algorithm that learns an $varepsilon$-approximate Markov NE policy using $$ widetildeObigg.
We derive a refined regret bound for FTRL that makes explicit the role of variance-type quantities.
arXiv Detail & Related papers (2022-08-22T17:24:55Z) - An Experimental Design Approach for Regret Minimization in Logistic
Bandits [26.674062544226636]
The main challenge of logistic bandits is reducing the dependence on a potentially large problem dependent constant $kappa$.
We propose a new warmup sampling algorithm that can dramatically reduce the lower order term in the regret in general.
arXiv Detail & Related papers (2022-02-04T21:56:40Z) - Stochastic Bandits with Vector Losses: Minimizing $\ell^\infty$-Norm of
Relative Losses [21.085722900446257]
Multi-armed bandits are widely applied in recommender systems, for which the goal is to maximize the click rate.
In this paper, we model this situation as a problem of $K$-armed bandit with multiple losses.
arXiv Detail & Related papers (2020-10-15T23:03:35Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.