Blocked Collaborative Bandits: Online Collaborative Filtering with
Per-Item Budget Constraints
- URL: http://arxiv.org/abs/2311.03376v1
- Date: Tue, 31 Oct 2023 11:04:21 GMT
- Title: Blocked Collaborative Bandits: Online Collaborative Filtering with
Per-Item Budget Constraints
- Authors: Soumyabrata Pal, Arun Sai Suggala, Karthikeyan Shanmugam, Prateek Jain
- Abstract summary: We consider the problem of emphblocked collaborative bandits where there are multiple users, each with an associated multi-armed bandit problem.
Our goal is to design algorithms that maximize the cumulative reward accrued by all the users over time.
textttB-LATTICE achieves a per-user regret of $widetildeO(sqrtmathsfT(sqrtmathsfNmathsfM-1)$ under a budget constraint.
- Score: 46.65419724935037
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of \emph{blocked} collaborative bandits where there
are multiple users, each with an associated multi-armed bandit problem. These
users are grouped into \emph{latent} clusters such that the mean reward vectors
of users within the same cluster are identical. Our goal is to design
algorithms that maximize the cumulative reward accrued by all the users over
time, under the \emph{constraint} that no arm of a user is pulled more than
$\mathsf{B}$ times. This problem has been originally considered by
\cite{Bresler:2014}, and designing regret-optimal algorithms for it has since
remained an open problem. In this work, we propose an algorithm called
\texttt{B-LATTICE} (Blocked Latent bAndiTs via maTrIx ComplEtion) that
collaborates across users, while simultaneously satisfying the budget
constraints, to maximize their cumulative rewards. Theoretically, under certain
reasonable assumptions on the latent structure, with $\mathsf{M}$ users,
$\mathsf{N}$ arms, $\mathsf{T}$ rounds per user, and $\mathsf{C}=O(1)$ latent
clusters, \texttt{B-LATTICE} achieves a per-user regret of
$\widetilde{O}(\sqrt{\mathsf{T}(1 + \mathsf{N}\mathsf{M}^{-1})}$ under a budget
constraint of $\mathsf{B}=\Theta(\log \mathsf{T})$. These are the first
sub-linear regret bounds for this problem, and match the minimax regret bounds
when $\mathsf{B}=\mathsf{T}$. Empirically, we demonstrate that our algorithm
has superior performance over baselines even when $\mathsf{B}=1$.
\texttt{B-LATTICE} runs in phases where in each phase it clusters users into
groups and collaborates across users within a group to quickly learn their
reward models.
Related papers
- Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - SQ Lower Bounds for Learning Mixtures of Linear Classifiers [43.63696593768504]
We show that known algorithms for this problem are essentially best possible, even for the special case of uniform mixtures.
The key technical ingredient is a new construction of spherical designs that may be of independent interest.
arXiv Detail & Related papers (2023-10-18T10:56:57Z) - On Interpolating Experts and Multi-Armed Bandits [1.9497955504539408]
We prove tight minimax regret bounds for $mathbfm$-MAB and design an optimal PAC algorithm for its pure exploration version, $mathbfm$-BAI.
As consequences, we obtained tight minimax regret bounds for several families of feedback graphs.
arXiv Detail & Related papers (2023-07-14T10:38:30Z) - Optimal Algorithms for Latent Bandits with Cluster Structure [50.44722775727619]
We consider the problem of latent bandits with cluster structure where there are multiple users, each with an associated multi-armed bandit problem.
We propose LATTICE which allows exploitation of the latent cluster structure to provide the minimax optimal regret of $widetildeO(sqrt(mathsfM+mathsfN)mathsfT.
arXiv Detail & Related papers (2023-01-17T17:49:04Z) - Online Low Rank Matrix Completion [28.316718791239303]
We study the problem of textitonline low-rank matrix completion with $mathsfM$ users, $mathsfN$ items and $mathsfT rounds.
For each, we obtain a (trivial) reward sampled from a low-linear user-item reward matrix.
Our algorithm uses a novel technique of clustering users and eliminating items jointly and iteratively, which allows us to obtain nearly minimax optimal rate in $mathsfT$.
arXiv Detail & Related papers (2022-09-08T18:49:10Z) - On Submodular Contextual Bandits [92.45432756301231]
We consider the problem of contextual bandits where actions are subsets of a ground set and mean rewards are modeled by an unknown monotone submodular function.
We show that our algorithm efficiently randomizes around local optima of estimated functions according to the Inverse Gap Weighting strategy.
arXiv Detail & Related papers (2021-12-03T21:42:33Z) - Linear Bandits on Uniformly Convex Sets [88.3673525964507]
Linear bandit algorithms yield $tildemathcalO(nsqrtT)$ pseudo-regret bounds on compact convex action sets.
Two types of structural assumptions lead to better pseudo-regret bounds.
arXiv Detail & Related papers (2021-03-10T07:33:03Z) - Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits
with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints.
For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z) - Optimal Strategies for Graph-Structured Bandits [0.0]
We study a structured variant of the multi-armed bandit problem specified by a set of Bernoulli $!=!(nu_a,b)_a in mathcalA, b in mathcalB$ with means $(mu_a,b)_a in mathcalA, b in mathcalB$ with means $(mu_a,b)_a
arXiv Detail & Related papers (2020-07-07T06:27:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.