Related papers: Optimal Algorithms for Latent Bandits with Cluster Structure

Optimal Algorithms for Latent Bandits with Cluster Structure

URL: http://arxiv.org/abs/2301.07040v3
Date: Tue, 11 Jul 2023 07:04:02 GMT
Title: Optimal Algorithms for Latent Bandits with Cluster Structure
Authors: Soumyabrata Pal, Arun Sai Suggala, Karthikeyan Shanmugam, Prateek Jain
Abstract summary: We consider the problem of latent bandits with cluster structure where there are multiple users, each with an associated multi-armed bandit problem. We propose LATTICE which allows exploitation of the latent cluster structure to provide the minimax optimal regret of $widetildeO(sqrt(mathsfM+mathsfN)mathsfT.
Score: 50.44722775727619
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of latent bandits with cluster structure where there are multiple users, each with an associated multi-armed bandit problem. These users are grouped into \emph{latent} clusters such that the mean reward vectors of users within the same cluster are identical. At each round, a user, selected uniformly at random, pulls an arm and observes a corresponding noisy reward. The goal of the users is to maximize their cumulative rewards. This problem is central to practical recommendation systems and has received wide attention of late \cite{gentile2014online, maillard2014latent}. Now, if each user acts independently, then they would have to explore each arm independently and a regret of $\Omega(\sqrt{\mathsf{MNT}})$ is unavoidable, where $\mathsf{M}, \mathsf{N}$ are the number of arms and users, respectively. Instead, we propose LATTICE (Latent bAndiTs via maTrIx ComplEtion) which allows exploitation of the latent cluster structure to provide the minimax optimal regret of $\widetilde{O}(\sqrt{(\mathsf{M}+\mathsf{N})\mathsf{T}})$, when the number of clusters is $\widetilde{O}(1)$. This is the first algorithm to guarantee such strong regret bound. LATTICE is based on a careful exploitation of arm information within a cluster while simultaneously clustering users. Furthermore, it is computationally efficient and requires only $O(\log{\mathsf{T}})$ calls to an offline matrix completion oracle across all $\mathsf{T}$ rounds.

Related papers

Near Optimal Best Arm Identification for Clustered Bandits [10.146588539113614]
We consider $M$ agents grouped into $M$ clusters, where each cluster solves a bandit problem.<n>We propose two novel algorithms: Clustering then Best Arm Identification (Cl-BAI) and Best Arm Identification then Clustering (BAI-Cl)<n>Cl-BAI uses a two-phase approach that first clusters agents based on the bandit problems they are learning, followed by identifying the best arm for each cluster.<n>BAI-Cl reverses the sequence by identifying the best arms first and then clustering agents accordingly.
arXiv Detail & Related papers (2025-05-15T10:20:26Z)
Restless Linear Bandits [5.00389879175348]
It is assumed that there exists an unknown $mathbbRd$-valued stationary $varphi$-mixing sequence of parameters $(theta_t,t in mathbbN)$ which gives rise to pay-offs. An optimistic algorithm, called LinMix-UCB, is proposed for the case where $theta_t$ has an exponential mixing rate.
arXiv Detail & Related papers (2024-05-17T14:37:39Z)
Blocked Collaborative Bandits: Online Collaborative Filtering with Per-Item Budget Constraints [46.65419724935037]
We consider the problem of emphblocked collaborative bandits where there are multiple users, each with an associated multi-armed bandit problem. Our goal is to design algorithms that maximize the cumulative reward accrued by all the users over time. textttB-LATTICE achieves a per-user regret of $widetildeO(sqrtmathsfT(sqrtmathsfNmathsfM-1)$ under a budget constraint.
arXiv Detail & Related papers (2023-10-31T11:04:21Z)
Combinatorial Neural Bandits [10.463365653675694]
We consider a contextual bandit problem where in each round a learning agent selects a subset of arms and receives feedback on the selected arms according to their scores. We propose algorithms: Combinatorial Neural UCB ($textttCN-UCB) and Combinatorial Thompson Sampling ($textttCN-TS$)
arXiv Detail & Related papers (2023-05-31T23:27:58Z)
Online Low Rank Matrix Completion [28.316718791239303]
We study the problem of textitonline low-rank matrix completion with $mathsfM$ users, $mathsfN$ items and $mathsfT rounds. For each, we obtain a (trivial) reward sampled from a low-linear user-item reward matrix. Our algorithm uses a novel technique of clustering users and eliminating items jointly and iteratively, which allows us to obtain nearly minimax optimal rate in $mathsfT$.
arXiv Detail & Related papers (2022-09-08T18:49:10Z)
On Submodular Contextual Bandits [92.45432756301231]
We consider the problem of contextual bandits where actions are subsets of a ground set and mean rewards are modeled by an unknown monotone submodular function. We show that our algorithm efficiently randomizes around local optima of estimated functions according to the Inverse Gap Weighting strategy.
arXiv Detail & Related papers (2021-12-03T21:42:33Z)
Collaborative Learning and Personalization in Multi-Agent Stochastic Linear Bandits [24.293155063082438]
We consider the problem of minimizing regret in an $N$ agent heterogeneous linear bandits framework, where the agents (users) are similar but not all identical. We show that, for any agent, the regret scales as $mathcalO(sqrtT/N)$, if the agent is in a well separated' cluster, or scales as $mathcalO(Tfrac12 + varepsilon/(N)frac12 -varepsilon)$ if its cluster
arXiv Detail & Related papers (2021-06-15T00:45:55Z)
Sleeping Combinatorial Bandits [15.004764451770441]
We adapt the well-known CUCB algorithm in the sleeping bandits setting and refer to it as CSUCB. We prove -- under mild conditions -- that the CSUCB algorithm achieves an $O(sqrtT log (T)$ instance-dependent regret guarantee. Our results are quite general and hold under general environments -- such as non-additive reward functions, volatile arm availability, a variable number of base-arms to be pulled -- arising in practical applications.
arXiv Detail & Related papers (2021-06-03T06:49:44Z)
Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy [71.17938026619068]
We study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous. We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weighting strategy. We show that our algorithm has a regret guarantee of $O(ksqrt(A-k+1)T log (|mathcalF|T))$.
arXiv Detail & Related papers (2021-02-15T19:10:52Z)
Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints. For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z)
Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors. We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.