Related papers: Adversarial Combinatorial Bandits with General Non-linear Reward Functions

Adversarial Combinatorial Bandits with General Non-linear Reward Functions

URL: http://arxiv.org/abs/2101.01301v1
Date: Tue, 5 Jan 2021 00:56:27 GMT
Title: Adversarial Combinatorial Bandits with General Non-linear Reward Functions
Authors: Xi Chen and Yanjun Han and Yining Wang
Abstract summary: We study the adversarial bandit with a known non-linear reward function, extending existing work on adversarial linear bandit. We show that with $N$ arms and subsets of $K$ arms being chosen at each of $T$ time periods, the minimax optimal regret is $widetildeTheta_d(Nd T)$ if the reward function is a $d$-degree with $d K$, and $ta_K(sqrtK T)$ if the reward function is not a low-degree.
Score: 29.775097827244853
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper we study the adversarial combinatorial bandit with a known non-linear reward function, extending existing work on adversarial linear combinatorial bandit. {The adversarial combinatorial bandit with general non-linear reward is an important open problem in bandit literature, and it is still unclear whether there is a significant gap from the case of linear reward, stochastic bandit, or semi-bandit feedback.} We show that, with $N$ arms and subsets of $K$ arms being chosen at each of $T$ time periods, the minimax optimal regret is $\widetilde\Theta_{d}(\sqrt{N^d T})$ if the reward function is a $d$-degree polynomial with $d< K$, and $\Theta_K(\sqrt{N^K T})$ if the reward function is not a low-degree polynomial. {Both bounds are significantly different from the bound $O(\sqrt{\mathrm{poly}(N,K)T})$ for the linear case, which suggests that there is a fundamental gap between the linear and non-linear reward structures.} Our result also finds applications to adversarial assortment optimization problem in online recommendation. We show that in the worst-case of adversarial assortment problem, the optimal algorithm must treat each individual $\binom{N}{K}$ assortment as independent.

Related papers

Nash Regret Guarantees for Linear Bandits [12.494184403263338]
We develop an algorithm that achieves a Nash regret of $Oleft( sqrtfracdnuT log( T |X|)right)$. In addition, addressing linear bandit instances in which the set of arms $X$ is not necessarily finite, we obtain a Nash regret upper bound $Oleft( fracdfrac54nufrac12sqrtT log(T)right)$.
arXiv Detail & Related papers (2023-10-03T12:58:10Z)
Nearly Minimax Algorithms for Linear Bandits with Shared Representation [86.79657561369397]
We consider the setting where we play $M$ linear bandits with dimension $d$, each for $T$ rounds, and these $M$ bandit tasks share a common $k(ll d)$ dimensional linear representation. We come up with novel algorithms that achieve $widetildeOleft(dsqrtkMT + kMsqrtTright)$ regret bounds, which matches the known minimax regret lower bound up to logarithmic factors.
arXiv Detail & Related papers (2022-03-29T15:27:13Z)
Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits [99.86860277006318]
We consider the problem of combining and learning over a set of adversarial algorithms with the goal of adaptively tracking the best one on the fly. The CORRAL of Agarwal et al. achieves this goal with a regret overhead of order $widetildeO(sqrtd S T)$ where $M$ is the number of base algorithms and $T$ is the time horizon. Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only emphlogarithmic dependence on $M$ as long
arXiv Detail & Related papers (2022-02-12T21:55:44Z)
On Submodular Contextual Bandits [92.45432756301231]
We consider the problem of contextual bandits where actions are subsets of a ground set and mean rewards are modeled by an unknown monotone submodular function. We show that our algorithm efficiently randomizes around local optima of estimated functions according to the Inverse Gap Weighting strategy.
arXiv Detail & Related papers (2021-12-03T21:42:33Z)
Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints. For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z)
DART: aDaptive Accept RejecT for non-linear top-K subset identification [34.68931784199599]
We consider the bandit problem of selecting $K$ out of $N$ arms at each time step. We present a novel algorithm for the setting without using individual arm feedback or requiring linearity of the reward function. Our algorithm, aDaptive Accept RejecT (DART), sequentially finds good arms and eliminates bad arms based on confidence bounds.
arXiv Detail & Related papers (2020-11-16T02:10:06Z)
Impact of Representation Learning in Linear Bandits [83.17684841392754]
We study how representation learning can improve the efficiency of bandit problems. We present a new algorithm which achieves $widetildeO(TsqrtkN + sqrtdkNT)$ regret, where $N$ is the number of rounds we play for each bandit.
arXiv Detail & Related papers (2020-10-13T16:35:30Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
Low-Rank Generalized Linear Bandit Problems [19.052901817304743]
In a low-rank linear bandit problem, the reward of an action is the inner product between the action and an unknown low-rank matrix $Theta*$. We propose an algorithm based on a novel combination of online-to-confidence-set conversioncitepabbasi2012online and the exponentially weighted average forecaster constructed by a covering of low-rank matrices.
arXiv Detail & Related papers (2020-06-04T15:30:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.