Adversarial Combinatorial Bandits with General Non-linear Reward
Functions
- URL: http://arxiv.org/abs/2101.01301v1
- Date: Tue, 5 Jan 2021 00:56:27 GMT
- Title: Adversarial Combinatorial Bandits with General Non-linear Reward
Functions
- Authors: Xi Chen and Yanjun Han and Yining Wang
- Abstract summary: We study the adversarial bandit with a known non-linear reward function, extending existing work on adversarial linear bandit.
We show that with $N$ arms and subsets of $K$ arms being chosen at each of $T$ time periods, the minimax optimal regret is $widetildeTheta_d(Nd T)$ if the reward function is a $d$-degree with $d K$, and $ta_K(sqrtK T)$ if the reward function is not a low-degree.
- Score: 29.775097827244853
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper we study the adversarial combinatorial bandit with a known
non-linear reward function, extending existing work on adversarial linear
combinatorial bandit. {The adversarial combinatorial bandit with general
non-linear reward is an important open problem in bandit literature, and it is
still unclear whether there is a significant gap from the case of linear
reward, stochastic bandit, or semi-bandit feedback.} We show that, with $N$
arms and subsets of $K$ arms being chosen at each of $T$ time periods, the
minimax optimal regret is $\widetilde\Theta_{d}(\sqrt{N^d T})$ if the reward
function is a $d$-degree polynomial with $d< K$, and $\Theta_K(\sqrt{N^K T})$
if the reward function is not a low-degree polynomial. {Both bounds are
significantly different from the bound $O(\sqrt{\mathrm{poly}(N,K)T})$ for the
linear case, which suggests that there is a fundamental gap between the linear
and non-linear reward structures.} Our result also finds applications to
adversarial assortment optimization problem in online recommendation. We show
that in the worst-case of adversarial assortment problem, the optimal algorithm
must treat each individual $\binom{N}{K}$ assortment as independent.
Related papers
- Nash Regret Guarantees for Linear Bandits [12.494184403263338]
We develop an algorithm that achieves a Nash regret of $Oleft( sqrtfracdnuT log( T |X|)right)$.
In addition, addressing linear bandit instances in which the set of arms $X$ is not necessarily finite, we obtain a Nash regret upper bound $Oleft( fracdfrac54nufrac12sqrtT log(T)right)$.
arXiv Detail & Related papers (2023-10-03T12:58:10Z) - Nearly Minimax Algorithms for Linear Bandits with Shared Representation [86.79657561369397]
We consider the setting where we play $M$ linear bandits with dimension $d$, each for $T$ rounds, and these $M$ bandit tasks share a common $k(ll d)$ dimensional linear representation.
We come up with novel algorithms that achieve $widetildeOleft(dsqrtkMT + kMsqrtTright)$ regret bounds, which matches the known minimax regret lower bound up to logarithmic factors.
arXiv Detail & Related papers (2022-03-29T15:27:13Z) - Corralling a Larger Band of Bandits: A Case Study on Switching Regret
for Linear Bandits [99.86860277006318]
We consider the problem of combining and learning over a set of adversarial algorithms with the goal of adaptively tracking the best one on the fly.
The CORRAL of Agarwal et al. achieves this goal with a regret overhead of order $widetildeO(sqrtd S T)$ where $M$ is the number of base algorithms and $T$ is the time horizon.
Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only emphlogarithmic dependence on $M$ as long
arXiv Detail & Related papers (2022-02-12T21:55:44Z) - On Submodular Contextual Bandits [92.45432756301231]
We consider the problem of contextual bandits where actions are subsets of a ground set and mean rewards are modeled by an unknown monotone submodular function.
We show that our algorithm efficiently randomizes around local optima of estimated functions according to the Inverse Gap Weighting strategy.
arXiv Detail & Related papers (2021-12-03T21:42:33Z) - Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits
with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints.
For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z) - DART: aDaptive Accept RejecT for non-linear top-K subset identification [34.68931784199599]
We consider the bandit problem of selecting $K$ out of $N$ arms at each time step.
We present a novel algorithm for the setting without using individual arm feedback or requiring linearity of the reward function.
Our algorithm, aDaptive Accept RejecT (DART), sequentially finds good arms and eliminates bad arms based on confidence bounds.
arXiv Detail & Related papers (2020-11-16T02:10:06Z) - Impact of Representation Learning in Linear Bandits [83.17684841392754]
We study how representation learning can improve the efficiency of bandit problems.
We present a new algorithm which achieves $widetildeO(TsqrtkN + sqrtdkNT)$ regret, where $N$ is the number of rounds we play for each bandit.
arXiv Detail & Related papers (2020-10-13T16:35:30Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z) - Low-Rank Generalized Linear Bandit Problems [19.052901817304743]
In a low-rank linear bandit problem, the reward of an action is the inner product between the action and an unknown low-rank matrix $Theta*$.
We propose an algorithm based on a novel combination of online-to-confidence-set conversioncitepabbasi2012online and the exponentially weighted average forecaster constructed by a covering of low-rank matrices.
arXiv Detail & Related papers (2020-06-04T15:30:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.