Fugu-MT 論文翻訳(概要): Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems

論文の概要: Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems

arxiv url: http://arxiv.org/abs/2401.07298v1
Date: Sun, 14 Jan 2024 14:14:19 GMT
ステータス: 翻訳完了
システム内更新日: 2024-01-17 18:56:36.579781
Title: Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems
Title（参考訳）: 一般化低ランク行列帯域問題のための効率的なフレームワーク
Authors: Yue Kang, Cho-Jui Hsieh, Thomas C. M. Lee
Abstract要約: 一般化線形モデル (GLM) フレームワークを用いて, citelu2021low で提案した一般化低ランク行列帯域問題について検討する。既存のアルゴリズムの計算不可能性と理論的制約を克服するため,まずG-ESTTフレームワークを提案する。 G-ESTT は $tildeO(sqrt(d_1+d_2)3/2Mr3/2T)$ bound of regret を達成でき、G-ESTS は $tildeO を達成できることを示す。
参考スコア（独自算出の注目度）: 61.85150061213987
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the stochastic contextual low-rank matrix bandit problem, the expected reward of an action is given by the inner product between the action's feature matrix and some fixed, but initially unknown $d_1$ by $d_2$ matrix $\Theta^*$ with rank $r \ll \{d_1, d_2\}$, and an agent sequentially takes actions based on past experience to maximize the cumulative reward. In this paper, we study the generalized low-rank matrix bandit problem, which has been recently proposed in \cite{lu2021low} under the Generalized Linear Model (GLM) framework. To overcome the computational infeasibility and theoretical restrain of existing algorithms on this problem, we first propose the G-ESTT framework that modifies the idea from \cite{jun2019bilinear} by using Stein's method on the subspace estimation and then leverage the estimated subspaces via a regularization idea. Furthermore, we remarkably improve the efficiency of G-ESTT by using a novel exclusion idea on the estimated subspace instead, and propose the G-ESTS framework. We also show that G-ESTT can achieve the $\tilde{O}(\sqrt{(d_1+d_2)MrT})$ bound of regret while G-ESTS can achineve the $\tilde{O}(\sqrt{(d_1+d_2)^{3/2}Mr^{3/2}T})$ bound of regret under mild assumption up to logarithm terms, where $M$ is some problem dependent value. Under a reasonable assumption that $M = O((d_1+d_2)^2)$ in our problem setting, the regret of G-ESTT is consistent with the current best regret of $\tilde{O}((d_1+d_2)^{3/2} \sqrt{rT}/D_{rr})$~\citep{lu2021low} ($D_{rr}$ will be defined later). For completeness, we conduct experiments to illustrate that our proposed algorithms, especially G-ESTS, are also computationally tractable and consistently outperform other state-of-the-art (generalized) linear matrix bandit methods based on a suite of simulations.
Abstract（参考訳）: 確率的文脈的低ランク行列バンドイット問題において、アクションの期待された報酬は、アクションの特徴行列と固定されたいくつかの内部積によって与えられるが、最初は未知の$d_1$ by $d_2$ matrix $\Theta^*$ with rank $r \ll \{d_1, d_2\}$で与えられる。本稿では,一般化線形モデル(GLM)の枠組みの下で,最近 \cite{lu2021low} で提案されている一般化低ランク行列バンドイット問題について検討する。この問題に対する既存のアルゴリズムの計算不可能性と理論的制約を克服するために,まず,部分空間推定におけるsteinの手法を用いて \cite{jun2019bilinear} のアイデアを修飾した g-estt フレームワークを提案し,その推定部分空間を正規化アイデアで活用する。さらに,推定部分空間上の新しい排他的概念を用いてG-ESTTの効率を著しく向上させ,G-ESTSフレームワークを提案する。また、G-ESTT が $\tilde{O}(\sqrt{(d_1+d_2)MrT})$ bound of regret を達成できるのに対し、G-ESTS は $\tilde{O}(\sqrt{(d_1+d_2)^{3/2}Mr^{3/2}T})$ bound of regret を対数的な仮定で達成できる。 M = O(((d_1+d_2)^2)$(d_1+d_2)^2)$ という合理的な仮定の下では、G-ESTT の後悔は $\tilde{O}((d_1+d_2)^{3/2} \sqrt{rT}/D_{rr})$~\citep{lu2021low}$D_{rr}$ の現在の最善後悔と一致する。完全性のために,提案アルゴリズム,特にG-ESTSは,計算可能であり,一連のシミュレーションに基づいて,他の最先端(一般化)線形行列バンドイット法より一貫して優れていることを示す実験を行う。

論文の概要: Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems

関連論文リスト