Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy
- URL: http://arxiv.org/abs/2102.07800v1
- Date: Mon, 15 Feb 2021 19:10:52 GMT
- Title: Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy
- Authors: Rajat Sen, Alexander Rakhlin, Lexing Ying, Rahul Kidambi, Dean Foster,
Daniel Hill, Inderjit Dhillon
- Abstract summary: We study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous.
We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weighting strategy.
We show that our algorithm has a regret guarantee of $O(ksqrt(A-k+1)T log (|mathcalF|T))$.
- Score: 71.17938026619068
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Motivated by modern applications, such as online advertisement and
recommender systems, we study the top-$k$ extreme contextual bandits problem,
where the total number of arms can be enormous, and the learner is allowed to
select $k$ arms and observe all or some of the rewards for the chosen arms. We
first propose an algorithm for the non-extreme realizable setting, utilizing
the Inverse Gap Weighting strategy for selecting multiple arms. We show that
our algorithm has a regret guarantee of $O(k\sqrt{(A-k+1)T \log
(|\mathcal{F}|T)})$, where $A$ is the total number of arms and $\mathcal{F}$ is
the class containing the regression function, while only requiring
$\tilde{O}(A)$ computation per time step. In the extreme setting, where the
total number of arms can be in the millions, we propose a practically-motivated
arm hierarchy model that induces a certain structure in mean rewards to ensure
statistical and computational efficiency. The hierarchical structure allows for
an exponential reduction in the number of relevant arms for each context, thus
resulting in a regret guarantee of $O(k\sqrt{(\log A-k+1)T \log
(|\mathcal{F}|T)})$. Finally, we implement our algorithm using a hierarchical
linear function class and show superior performance with respect to well-known
benchmarks on simulated bandit feedback experiments using extreme multi-label
classification datasets. On a dataset with three million arms, our reduction
scheme has an average inference time of only 7.9 milliseconds, which is a 100x
improvement.
Related papers
- A General Framework for Clustering and Distribution Matching with Bandit Feedback [81.50716021326194]
We develop a general framework for clustering and distribution matching problems with bandit feedback.
We derive a non-asymptotic lower bound on the average number of arm pulls for any online algorithm with an error probability not exceeding $delta$.
arXiv Detail & Related papers (2024-09-08T12:19:12Z) - Nearly-tight Approximation Guarantees for the Improving Multi-Armed Bandits Problem [10.994427113326996]
An instance of this problem has $k$ arms, each of whose reward function is a concave and increasing function of the number of times that arm has been pulled so far.
We show that for any randomized online algorithm, there exists an instance on which it must suffer at least an $Omega(sqrtk)$ approximation factor relative to the optimal reward.
We then show how to remove this assumption at the cost of an extra $O(sqrtk log k)$ approximation factor, achieving an overall $O(sqrtk
arXiv Detail & Related papers (2024-04-01T15:55:45Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - On Submodular Contextual Bandits [92.45432756301231]
We consider the problem of contextual bandits where actions are subsets of a ground set and mean rewards are modeled by an unknown monotone submodular function.
We show that our algorithm efficiently randomizes around local optima of estimated functions according to the Inverse Gap Weighting strategy.
arXiv Detail & Related papers (2021-12-03T21:42:33Z) - Best-Arm Identification in Correlated Multi-Armed Bandits [9.180690233707645]
We propose a novel correlated bandit framework that captures domain knowledge about correlation between arms.
We show that the total samples obtained by C-LUCB are of the form $mathcalOleft(sum_k in mathcalC logleft(frac1deltaright)$ as opposed to the typical $mathcalOleft(sum_k in mathcalC logleft(frac1deltaright)$ samples required in the independent reward setting
arXiv Detail & Related papers (2021-09-10T15:41:07Z) - Combinatorial Bandits without Total Order for Arms [52.93972547896022]
We present a reward model that captures set-dependent reward distribution and assumes no total order for arms.
We develop a novel regret analysis and show an $Oleft(frack2 n log Tepsilonright)$ gap-dependent regret bound as well as an $Oleft(k2sqrtn T log Tright)$ gap-independent regret bound.
arXiv Detail & Related papers (2021-03-03T23:08:59Z) - Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits
with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints.
For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z) - Blocking Bandits [33.14975454724348]
We consider a novel multi-armed bandit setting, where playing an arm makes it unavailable for a fixed number of time slots thereafter.
We show that with prior knowledge of the rewards and delays of all the arms, the problem of optimizing cumulative reward does not admit any pseudo-polynomial time algorithm.
We design a UCB based algorithm which is shown to have $c log T + o(log T)$ cumulative regret against the greedy algorithm.
arXiv Detail & Related papers (2019-07-27T20:42:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.