Fugu-MT 論文翻訳(概要): Optimal Algorithms for Latent Bandits with Cluster Structure

論文の概要: Optimal Algorithms for Latent Bandits with Cluster Structure

arxiv url: http://arxiv.org/abs/2301.07040v3
Date: Tue, 11 Jul 2023 07:04:02 GMT
ステータス: 翻訳完了
システム内更新日: 2023-07-12 18:57:46.700347
Title: Optimal Algorithms for Latent Bandits with Cluster Structure
Title（参考訳）: クラスター構造をもつ潜在バンディットの最適アルゴリズム
Authors: Soumyabrata Pal, Arun Sai Suggala, Karthikeyan Shanmugam, Prateek Jain
Abstract要約: 本稿では,複数のユーザが存在するクラスタ構造を持つ潜伏包帯問題と関連するマルチアーム包帯問題とを考察する。本稿では,潜伏クラスタ構造を利用して$widetildeO(sqrt(mathsfM+mathsfN)mathsfTの最小限の後悔を提供するLATTICEを提案する。
参考スコア（独自算出の注目度）: 50.44722775727619
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of latent bandits with cluster structure where there are multiple users, each with an associated multi-armed bandit problem. These users are grouped into \emph{latent} clusters such that the mean reward vectors of users within the same cluster are identical. At each round, a user, selected uniformly at random, pulls an arm and observes a corresponding noisy reward. The goal of the users is to maximize their cumulative rewards. This problem is central to practical recommendation systems and has received wide attention of late \cite{gentile2014online, maillard2014latent}. Now, if each user acts independently, then they would have to explore each arm independently and a regret of $\Omega(\sqrt{\mathsf{MNT}})$ is unavoidable, where $\mathsf{M}, \mathsf{N}$ are the number of arms and users, respectively. Instead, we propose LATTICE (Latent bAndiTs via maTrIx ComplEtion) which allows exploitation of the latent cluster structure to provide the minimax optimal regret of $\widetilde{O}(\sqrt{(\mathsf{M}+\mathsf{N})\mathsf{T}})$, when the number of clusters is $\widetilde{O}(1)$. This is the first algorithm to guarantee such strong regret bound. LATTICE is based on a careful exploitation of arm information within a cluster while simultaneously clustering users. Furthermore, it is computationally efficient and requires only $O(\log{\mathsf{T}})$ calls to an offline matrix completion oracle across all $\mathsf{T}$ rounds.
Abstract（参考訳）: 本稿では,複数のユーザが存在するクラスタ構造を持つ潜伏包帯問題と関連するマルチアーム包帯問題とを考察する。これらのユーザは,同一クラスタ内のユーザの平均報酬ベクトルが同一になるように,\emph{latent}クラスタにグループ化される。各ラウンドにおいて、ランダムに選択されたユーザは、腕を引っ張り、対応する騒がしい報酬を観察する。ユーザーの目標は累積報酬を最大化することだ。この問題は実用的なレコメンデーションシステムの中心であり、late \cite{gentile2014online, maillard2014latent} の注目を集めている。さて、もし各ユーザーが独立して振る舞うなら、それぞれの腕を独立に探索し、$\omega(\sqrt{\mathsf{mnt}})$の後悔は避けられない、ただし$\mathsf{m} と \mathsf{n}$ はそれぞれ腕の数とユーザ数である。代わりに、潜在クラスタ構造の活用により、クラスタ数が$\widetilde{o}(1)$である場合に、$\widetilde{o}(\sqrt{o}(\mathsf{m}+\mathsf{n})\mathsf{t}})$の最小の最適後悔を与える格子(行列完了によるラテンバンド)を提案する。これはそのような強い後悔の束縛を保証する最初のアルゴリズムである。 latticeは、ユーザをクラスタリングしながら、クラスタ内のarm情報の慎重な活用に基づいている。さらに、計算効率が良く、すべての$\mathsf{T}$ラウンドでオフライン行列補完オラクルを呼び出すのに$O(\log{\mathsf{T}})$しか必要としない。

関連論文リスト

Near Optimal Best Arm Identification for Clustered Bandits [10.146588539113614]
我々は、M$エージェントを$M$クラスタにグループ化し、各クラスタがバンドイト問題を解決する。本稿では,クラスタリング,ベストアーム識別(Cl-BAI),ベストアーム識別(BAI-Cl)の2つの新しいアルゴリズムを提案する。 Cl-BAIは、学習しているバンディット問題に基づいて最初のクラスタエージェントを2フェーズのアプローチで処理し、続いて各クラスタに最適なアームを特定する。 BAI-Clは、まず最適な腕を特定し、次にそれに従ってクラスタリングエージェントを識別することで、配列を反転させる。
論文参考訳（メタデータ） (2025-05-15T10:20:26Z)
Restless Linear Bandits [5.00389879175348]
未知の$mathbbRd$-valued stationary $varphi$-mixing sequence of parameters $(theta_t,t in mathbbN)$ が存在すると仮定される。指数混合率が$theta_t$の場合、LinMix-UCBと呼ばれる楽観的なアルゴリズムが提案される。
論文参考訳（メタデータ） (2024-05-17T14:37:39Z)
Blocked Collaborative Bandits: Online Collaborative Filtering with Per-Item Budget Constraints [46.65419724935037]
本稿では,複数のユーザを抱えるエンブロック型協調バンドイットの問題点について考察する。私たちのゴールは、時間とともにすべてのユーザーが獲得した累積報酬を最大化するアルゴリズムを設計することです。 textttB-LATTICEは、予算制約の下で、ユーザ毎に$widetildeO(sqrtmathsfT(sqrtmathsfM-1)$を後悔する。
論文参考訳（メタデータ） (2023-10-31T11:04:21Z)
Combinatorial Neural Bandits [10.463365653675694]
学習エージェントが各ラウンドでアームのサブセットを選択し、そのスコアに応じて選択したアームのフィードバックを受け取るというコンテキスト的盗聴問題を考える。アルゴリズムを提案する: Combinatorial Neural UCB(textttCN-UCB)と Combinatorial Thompson Sampling(textttCN-TS$)。
論文参考訳（メタデータ） (2023-05-31T23:27:58Z)
Online Low Rank Matrix Completion [28.316718791239303]
我々は,textitonline Low-rank matrix completion with $mathsfM$ users, $mathsfN$ items and $mathsfT roundsについて検討した。それぞれ、低線形ユーザ-イテム報酬行列からサンプリングされた(自明な)報酬を得る。提案アルゴリズムは,ユーザをクラスタリングし,アイテムを協調的かつ反復的に除去する新しい手法を用いて,$mathsfT$で最小値に近い最適なレートを得ることができる。
論文参考訳（メタデータ） (2022-09-08T18:49:10Z)
On Submodular Contextual Bandits [92.45432756301231]
作用が基底集合の部分集合であり、平均報酬が未知の単調部分モジュラ函数によってモデル化されるような文脈的包帯の問題を考える。 Inverse Gap Weighting 戦略により,提案アルゴリズムは推定関数の局所的最適度を効率よくランダム化することを示す。
論文参考訳（メタデータ） (2021-12-03T21:42:33Z)
Contextual Combinatorial Bandits with Changing Action Sets via Gaussian Processes [8.919345630832366]
本稿では,アクションセットと時間変化によるベースアームの可利用性に関するコンテキスト的帯域幅問題について考察する。我々は,カーネル上信頼境界(O'CLOK-UCB)を用いた最適組合せ学習と最適化というアルゴリズムを提案する。アルゴリズムを劇的に高速化するために,スパースGPを用いたO'CLOK-UCBの変種を提案する。
論文参考訳（メタデータ） (2021-10-05T18:02:10Z)
Collaborative Learning and Personalization in Multi-Agent Stochastic Linear Bandits [24.293155063082438]
エージェント(ユーザ)が似ているが、すべて同一ではないような、N$エージェントの不均一な線形帯域幅フレームワークにおける後悔を最小限に抑える問題を考える。任意のエージェントに対して、後悔のスケールが$mathcalO(sqrtT/N)$、エージェントが十分に分離されたクラスタにある場合、あるいはクラスタが$mathcalO(Tfrac12 + varepsilon/(N)frac12 -varepsilon)$であることを示す。
論文参考訳（メタデータ） (2021-06-15T00:45:55Z)
Sleeping Combinatorial Bandits [15.004764451770441]
睡眠帯域設定においてよく知られたCUCBアルゴリズムを適用し,それをCSUCBと呼ぶ。穏やかな条件下では、CSUCBアルゴリズムが$O(sqrtT log (T)$ instance-dependent regret guaranteeを達成することを証明します。我々の結果は極めて一般的なものであり、非付加的な報酬関数、揮発性アームの可用性、引くべきベースアームの変動数など、実用的な応用によって生じる一般的な環境下で維持されます。
論文参考訳（メタデータ） (2021-06-03T06:49:44Z)
Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy [71.17938026619068]
我々は、腕の総数が膨大であることができるトップ$ k$極端な文脈的包帯問題を研究します。まず,Inverse Gap Weighting戦略を用いて,非極端に実現可能な設定のアルゴリズムを提案する。我々のアルゴリズムは、$O(ksqrt(A-k+1)T log (|mathcalF|T))$である。
論文参考訳（メタデータ） (2021-02-15T19:10:52Z)
Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions [53.77572276969548]
我々は、C$2$UCBアルゴリズムが分割マトロイド制約に対して最適な後悔結合$tildeO(dsqrtkT + dk)$を有することを示した。一般的な制約に対して,C$2$UCBアルゴリズムで腕の報酬推定値を変更するアルゴリズムを提案する。
論文参考訳（メタデータ） (2021-01-20T04:29:18Z)
Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
我々は,損失予測の助けを借りて,文脈的包帯における学習を研究する。最適な後悔は$mathcalO(minsqrtT, sqrtmathcalETfrac13)$である。
論文参考訳（メタデータ） (2020-03-04T07:36:38Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。