Online Low Rank Matrix Completion
- URL: http://arxiv.org/abs/2209.03997v1
- Date: Thu, 8 Sep 2022 18:49:10 GMT
- Title: Online Low Rank Matrix Completion
- Authors: Prateek Jain and Soumyabrata Pal
- Abstract summary: We study the problem of textitonline low-rank matrix completion with $mathsfM$ users, $mathsfN$ items and $mathsfT rounds.
For each, we obtain a (trivial) reward sampled from a low-linear user-item reward matrix.
Our algorithm uses a novel technique of clustering users and eliminating items jointly and iteratively, which allows us to obtain nearly minimax optimal rate in $mathsfT$.
- Score: 28.316718791239303
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of \textit{online} low-rank matrix completion with
$\mathsf{M}$ users, $\mathsf{N}$ items and $\mathsf{T}$ rounds. In each round,
we recommend one item per user. For each recommendation, we obtain a (noisy)
reward sampled from a low-rank user-item reward matrix. The goal is to design
an online method with sub-linear regret (in $\mathsf{T}$). While the problem
can be mapped to the standard multi-armed bandit problem where each item is an
\textit{independent} arm, it leads to poor regret as the correlation between
arms and users is not exploited. In contrast, exploiting the low-rank structure
of reward matrix is challenging due to non-convexity of low-rank manifold. We
overcome this challenge using an explore-then-commit (ETC) approach that
ensures a regret of $O(\mathsf{polylog} (\mathsf{M}+\mathsf{N})
\mathsf{T}^{2/3})$. That is, roughly only $\mathsf{polylog}
(\mathsf{M}+\mathsf{N})$ item recommendations are required per user to get
non-trivial solution. We further improve our result for the rank-$1$ setting.
Here, we propose a novel algorithm OCTAL (Online Collaborative filTering using
iterAtive user cLustering) that ensures nearly optimal regret bound of
$O(\mathsf{polylog} (\mathsf{M}+\mathsf{N}) \mathsf{T}^{1/2})$. Our algorithm
uses a novel technique of clustering users and eliminating items jointly and
iteratively, which allows us to obtain nearly minimax optimal rate in
$\mathsf{T}$.
Related papers
- Online Matrix Completion: A Collaborative Approach with Hott Items [19.781869063637387]
We investigate the low rank matrix completion problem in an online setting with $M$ users, $N$ items, $T$ rounds, and an unknown rank-$r$ reward matrix $Rin mathbbRMtimes N$.
arXiv Detail & Related papers (2024-08-11T18:49:52Z) - Online Learning of Halfspaces with Massart Noise [47.71073318490341]
We study the task of online learning in the presence of Massart noise.
We present a computationally efficient algorithm that achieves mistake bound $eta T + o(T)$.
We use our Massart online learner to design an efficient bandit algorithm that obtains expected reward at least $(1-1/k) Delta T - o(T)$ bigger than choosing a random action at every round.
arXiv Detail & Related papers (2024-05-21T17:31:10Z) - Blocked Collaborative Bandits: Online Collaborative Filtering with
Per-Item Budget Constraints [46.65419724935037]
We consider the problem of emphblocked collaborative bandits where there are multiple users, each with an associated multi-armed bandit problem.
Our goal is to design algorithms that maximize the cumulative reward accrued by all the users over time.
textttB-LATTICE achieves a per-user regret of $widetildeO(sqrtmathsfT(sqrtmathsfNmathsfM-1)$ under a budget constraint.
arXiv Detail & Related papers (2023-10-31T11:04:21Z) - SQ Lower Bounds for Learning Mixtures of Linear Classifiers [43.63696593768504]
We show that known algorithms for this problem are essentially best possible, even for the special case of uniform mixtures.
The key technical ingredient is a new construction of spherical designs that may be of independent interest.
arXiv Detail & Related papers (2023-10-18T10:56:57Z) - Optimal Algorithms for Latent Bandits with Cluster Structure [50.44722775727619]
We consider the problem of latent bandits with cluster structure where there are multiple users, each with an associated multi-armed bandit problem.
We propose LATTICE which allows exploitation of the latent cluster structure to provide the minimax optimal regret of $widetildeO(sqrt(mathsfM+mathsfN)mathsfT.
arXiv Detail & Related papers (2023-01-17T17:49:04Z) - On Submodular Contextual Bandits [92.45432756301231]
We consider the problem of contextual bandits where actions are subsets of a ground set and mean rewards are modeled by an unknown monotone submodular function.
We show that our algorithm efficiently randomizes around local optima of estimated functions according to the Inverse Gap Weighting strategy.
arXiv Detail & Related papers (2021-12-03T21:42:33Z) - Contextual Recommendations and Low-Regret Cutting-Plane Algorithms [49.91214213074933]
We consider the following variant of contextual linear bandits motivated by routing applications in navigational engines and recommendation systems.
We design novel cutting-plane algorithms with low "regret" -- the total distance between the true point $w*$ and the hyperplanes the separation oracle returns.
arXiv Detail & Related papers (2021-06-09T05:39:05Z) - Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors.
We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.