Fugu-MT 論文翻訳(概要): On the Minimax Regret for Online Learning with Feedback Graphs

論文の概要: On the Minimax Regret for Online Learning with Feedback Graphs

arxiv url: http://arxiv.org/abs/2305.15383v2
Date: Sat, 28 Oct 2023 14:11:51 GMT
ステータス: 翻訳完了
システム内更新日: 2023-10-31 22:17:42.797477
Title: On the Minimax Regret for Online Learning with Feedback Graphs
Title（参考訳）: フィードバックグラフを用いたオンライン学習のためのMinimaxレグレクトについて
Authors: Khaled Eldowa, Emmanuel Esposito, Tommaso Cesari, Nicol\`o Cesa-Bianchi
Abstract要約: 強く観察可能な無向フィードバックグラフを用いて,オンライン学習を後悔する上で,上層と下層の境界を改善した。改良された上界$mathcalObigl(sqrtalpha T(ln K)/(lnalpha)bigr)$ hold for any $alpha$ and the lower bounds for bandits and experts。
参考スコア（独自算出の注目度）: 5.721380617450645
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we improve on the upper and lower bounds for the regret of online learning with strongly observable undirected feedback graphs. The best known upper bound for this problem is $\mathcal{O}\bigl(\sqrt{\alpha T\ln K}\bigr)$, where $K$ is the number of actions, $\alpha$ is the independence number of the graph, and $T$ is the time horizon. The $\sqrt{\ln K}$ factor is known to be necessary when $\alpha = 1$ (the experts case). On the other hand, when $\alpha = K$ (the bandits case), the minimax rate is known to be $\Theta\bigl(\sqrt{KT}\bigr)$, and a lower bound $\Omega\bigl(\sqrt{\alpha T}\bigr)$ is known to hold for any $\alpha$. Our improved upper bound $\mathcal{O}\bigl(\sqrt{\alpha T(1+\ln(K/\alpha))}\bigr)$ holds for any $\alpha$ and matches the lower bounds for bandits and experts, while interpolating intermediate cases. To prove this result, we use FTRL with $q$-Tsallis entropy for a carefully chosen value of $q \in [1/2, 1)$ that varies with $\alpha$. The analysis of this algorithm requires a new bound on the variance term in the regret. We also show how to extend our techniques to time-varying graphs, without requiring prior knowledge of their independence numbers. Our upper bound is complemented by an improved $\Omega\bigl(\sqrt{\alpha T(\ln K)/(\ln\alpha)}\bigr)$ lower bound for all $\alpha > 1$, whose analysis relies on a novel reduction to multitask learning. This shows that a logarithmic factor is necessary as soon as $\alpha < K$.
Abstract（参考訳）: 本研究では,オンライン学習の後悔に対する上層と下層の境界を,強く観察不能なフィードバックグラフを用いて改善する。この問題の最もよく知られている上限は$\mathcal{o}\bigl(\sqrt{\alpha t\ln k}\bigr)$であり、ここで$k$はアクションの数、$\alpha$はグラフの独立数、$t$は時間軸である。 $\sqrt{\ln K}$因子は、$\alpha = 1$(専門家の場合)に必要であることが知られている。一方、$\alpha = K$(盗賊の場合)の場合、ミニマックスレートは$\Theta\bigl(\sqrt{KT}\bigr)$、下界の$\Omega\bigl(\sqrt{\alpha T}\bigr)$は任意の$\alpha$に対して保持されることが知られている。改良された上限 $\mathcal{o}\bigl(\sqrt{\alpha t(1+\ln(k/\alpha)))}\bigr)$ は任意の$\alpha$ に対して成立し、中間の場合を補間しながら、バンディットや専門家の下限に一致する。この結果を証明するために、$q$-Tsallis entropyで、$\alpha$と異なる$q \in [1/2, 1)$の慎重に選択された値にFTRLを使用する。このアルゴリズムの解析には、後悔の中の分散項に新しい境界が必要である。また,我々の手法を時間変化グラフに拡張する方法を,その独立数の事前知識を必要とせずに示す。我々の上限は改良された$\Omega\bigl(\sqrt{\alpha T(\ln K)/(\ln\alpha)}\bigr)$ lower bound for all $\alpha > 1$で補われ、その解析はマルチタスク学習への新たな還元に依存している。これは、対数因子はすぐに$\alpha < k$ となることを示している。

論文の概要: On the Minimax Regret for Online Learning with Feedback Graphs

関連論文リスト