Optimal Algorithms for Latent Bandits with Cluster Structure
- URL: http://arxiv.org/abs/2301.07040v3
- Date: Tue, 11 Jul 2023 07:04:02 GMT
- Title: Optimal Algorithms for Latent Bandits with Cluster Structure
- Authors: Soumyabrata Pal, Arun Sai Suggala, Karthikeyan Shanmugam, Prateek Jain
- Abstract summary: We consider the problem of latent bandits with cluster structure where there are multiple users, each with an associated multi-armed bandit problem.
We propose LATTICE which allows exploitation of the latent cluster structure to provide the minimax optimal regret of $widetildeO(sqrt(mathsfM+mathsfN)mathsfT.
- Score: 50.44722775727619
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of latent bandits with cluster structure where there
are multiple users, each with an associated multi-armed bandit problem. These
users are grouped into \emph{latent} clusters such that the mean reward vectors
of users within the same cluster are identical. At each round, a user, selected
uniformly at random, pulls an arm and observes a corresponding noisy reward.
The goal of the users is to maximize their cumulative rewards. This problem is
central to practical recommendation systems and has received wide attention of
late \cite{gentile2014online, maillard2014latent}. Now, if each user acts
independently, then they would have to explore each arm independently and a
regret of $\Omega(\sqrt{\mathsf{MNT}})$ is unavoidable, where $\mathsf{M},
\mathsf{N}$ are the number of arms and users, respectively. Instead, we propose
LATTICE (Latent bAndiTs via maTrIx ComplEtion) which allows exploitation of the
latent cluster structure to provide the minimax optimal regret of
$\widetilde{O}(\sqrt{(\mathsf{M}+\mathsf{N})\mathsf{T}})$, when the number of
clusters is $\widetilde{O}(1)$. This is the first algorithm to guarantee such
strong regret bound. LATTICE is based on a careful exploitation of arm
information within a cluster while simultaneously clustering users.
Furthermore, it is computationally efficient and requires only
$O(\log{\mathsf{T}})$ calls to an offline matrix completion oracle across all
$\mathsf{T}$ rounds.
Related papers
- Restless Linear Bandits [5.00389879175348]
It is assumed that there exists an unknown $mathbbRd$-valued stationary $varphi$-mixing sequence of parameters $(theta_t,t in mathbbN)$ which gives rise to pay-offs.
An optimistic algorithm, called LinMix-UCB, is proposed for the case where $theta_t$ has an exponential mixing rate.
arXiv Detail & Related papers (2024-05-17T14:37:39Z) - Blocked Collaborative Bandits: Online Collaborative Filtering with
Per-Item Budget Constraints [46.65419724935037]
We consider the problem of emphblocked collaborative bandits where there are multiple users, each with an associated multi-armed bandit problem.
Our goal is to design algorithms that maximize the cumulative reward accrued by all the users over time.
textttB-LATTICE achieves a per-user regret of $widetildeO(sqrtmathsfT(sqrtmathsfNmathsfM-1)$ under a budget constraint.
arXiv Detail & Related papers (2023-10-31T11:04:21Z) - Online Low Rank Matrix Completion [28.316718791239303]
We study the problem of textitonline low-rank matrix completion with $mathsfM$ users, $mathsfN$ items and $mathsfT rounds.
For each, we obtain a (trivial) reward sampled from a low-linear user-item reward matrix.
Our algorithm uses a novel technique of clustering users and eliminating items jointly and iteratively, which allows us to obtain nearly minimax optimal rate in $mathsfT$.
arXiv Detail & Related papers (2022-09-08T18:49:10Z) - On Submodular Contextual Bandits [92.45432756301231]
We consider the problem of contextual bandits where actions are subsets of a ground set and mean rewards are modeled by an unknown monotone submodular function.
We show that our algorithm efficiently randomizes around local optima of estimated functions according to the Inverse Gap Weighting strategy.
arXiv Detail & Related papers (2021-12-03T21:42:33Z) - Collaborative Learning and Personalization in Multi-Agent Stochastic
Linear Bandits [24.293155063082438]
We consider the problem of minimizing regret in an $N$ agent heterogeneous linear bandits framework, where the agents (users) are similar but not all identical.
We show that, for any agent, the regret scales as $mathcalO(sqrtT/N)$, if the agent is in a well separated' cluster, or scales as $mathcalO(Tfrac12 + varepsilon/(N)frac12 -varepsilon)$ if its cluster
arXiv Detail & Related papers (2021-06-15T00:45:55Z) - Sleeping Combinatorial Bandits [15.004764451770441]
We adapt the well-known CUCB algorithm in the sleeping bandits setting and refer to it as CSUCB.
We prove -- under mild conditions -- that the CSUCB algorithm achieves an $O(sqrtT log (T)$ instance-dependent regret guarantee.
Our results are quite general and hold under general environments -- such as non-additive reward functions, volatile arm availability, a variable number of base-arms to be pulled -- arising in practical applications.
arXiv Detail & Related papers (2021-06-03T06:49:44Z) - Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy [71.17938026619068]
We study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous.
We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weighting strategy.
We show that our algorithm has a regret guarantee of $O(ksqrt(A-k+1)T log (|mathcalF|T))$.
arXiv Detail & Related papers (2021-02-15T19:10:52Z) - Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits
with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints.
For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z) - Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors.
We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.