Fugu-MT 論文翻訳(概要): Blocked Collaborative Bandits: Online Collaborative Filtering with Per-Item Budget Constraints

論文の概要: Blocked Collaborative Bandits: Online Collaborative Filtering with Per-Item Budget Constraints

arxiv url: http://arxiv.org/abs/2311.03376v1
Date: Tue, 31 Oct 2023 11:04:21 GMT
ステータス: 翻訳完了
システム内更新日: 2023-11-12 19:32:15.635834
Title: Blocked Collaborative Bandits: Online Collaborative Filtering with Per-Item Budget Constraints
Title（参考訳）: ブロッキング・コラボレーティブ・バンディット:オンライン・コラボレーティブ・フィルタリング
Authors: Soumyabrata Pal, Arun Sai Suggala, Karthikeyan Shanmugam, Prateek Jain
Abstract要約: 本稿では,複数のユーザを抱えるエンブロック型協調バンドイットの問題点について考察する。私たちのゴールは、時間とともにすべてのユーザーが獲得した累積報酬を最大化するアルゴリズムを設計することです。 textttB-LATTICEは、予算制約の下で、ユーザ毎に$widetildeO(sqrtmathsfT(sqrtmathsfM-1)$を後悔する。
参考スコア（独自算出の注目度）: 46.65419724935037
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of \emph{blocked} collaborative bandits where there are multiple users, each with an associated multi-armed bandit problem. These users are grouped into \emph{latent} clusters such that the mean reward vectors of users within the same cluster are identical. Our goal is to design algorithms that maximize the cumulative reward accrued by all the users over time, under the \emph{constraint} that no arm of a user is pulled more than $\mathsf{B}$ times. This problem has been originally considered by \cite{Bresler:2014}, and designing regret-optimal algorithms for it has since remained an open problem. In this work, we propose an algorithm called \texttt{B-LATTICE} (Blocked Latent bAndiTs via maTrIx ComplEtion) that collaborates across users, while simultaneously satisfying the budget constraints, to maximize their cumulative rewards. Theoretically, under certain reasonable assumptions on the latent structure, with $\mathsf{M}$ users, $\mathsf{N}$ arms, $\mathsf{T}$ rounds per user, and $\mathsf{C}=O(1)$ latent clusters, \texttt{B-LATTICE} achieves a per-user regret of $\widetilde{O}(\sqrt{\mathsf{T}(1 + \mathsf{N}\mathsf{M}^{-1})}$ under a budget constraint of $\mathsf{B}=\Theta(\log \mathsf{T})$. These are the first sub-linear regret bounds for this problem, and match the minimax regret bounds when $\mathsf{B}=\mathsf{T}$. Empirically, we demonstrate that our algorithm has superior performance over baselines even when $\mathsf{B}=1$. \texttt{B-LATTICE} runs in phases where in each phase it clusters users into groups and collaborates across users within a group to quickly learn their reward models.
Abstract（参考訳）: 複数のユーザがいて,それぞれが関連するマルチアームのバンディット問題を持つ,<emph{blocked>コラボレーティブなバンディットの問題を考える。これらのユーザは,同一クラスタ内のユーザの平均報酬ベクトルが同一になるように,\emph{latent}クラスタにグループ化される。当社の目標は、ユーザの腕が$\mathsf{b}$ times以上引き出されないという \emph{constraint} の下で、全ユーザの累積報酬を最大化するアルゴリズムを設計することです。この問題は、元々は \cite{bresler:2014} によって検討され、それに対する後悔最適化アルゴリズムの設計は未解決の問題のままである。本研究では,ユーザ間で協調し,同時に予算制約を満たすアルゴリズムである「texttt{B-LATTICE} (Blocked Latent bAndiTs via maTrIx ComplEtion)」を提案し,累積報酬を最大化する。理論的には、潜在構造上の一定の合理的な仮定の下では、$\mathsf{m}$ users, $\mathsf{n}$ arms, $\mathsf{t}$ rounds per user, $\mathsf{c}=o(1)$ latent clusters, \textt{b-lattice} は$\widetilde{o}(\sqrt{\mathsf{t}(1 + \mathsf{n}\mathsf{m}^{-1})}$ という予算制約の下で$\mathsf{b}=\theta(\log \mathsf{t})$ のユーザ一人当たりの後悔を達成する。これらはこの問題に対する最初のsub-linear regret boundsであり、$\mathsf{b}=\mathsf{t}$ のときのminimax regret boundsと一致する。経験的に、このアルゴリズムは$\mathsf{b}=1$でもベースラインよりも優れた性能を示す。 \texttt{B-LATTICE}は、各フェーズでユーザをグループに集約し、グループ内のユーザ間でコラボレーションして、報酬モデルを簡単に学習するフェーズで動作する。

論文の概要: Blocked Collaborative Bandits: Online Collaborative Filtering with Per-Item Budget Constraints

関連論文リスト