Fugu-MT 論文翻訳(概要): A Closer Look at Small-loss Bounds for Bandits with Graph Feedback

論文の概要: A Closer Look at Small-loss Bounds for Bandits with Graph Feedback

arxiv url: http://arxiv.org/abs/2002.00315v2
Date: Tue, 23 Jun 2020 02:06:49 GMT
ステータス: 翻訳完了
システム内更新日: 2023-01-04 19:48:08.838070
Title: A Closer Look at Small-loss Bounds for Bandits with Graph Feedback
Title（参考訳）: グラフフィードバックによるバンディットのスモールロス境界について
Authors: Chung-Wei Lee, Haipeng Luo, Mengxiao Zhang
Abstract要約: グラフフィードバックを用いた対向多腕バンディットの低損失境界について検討する。一般の強可観測グラフに対する最初の小さな空間境界を導出する。また、弱可観測グラフに対する小空間境界を導出する最初の試みも行う。
参考スコア（独自算出の注目度）: 39.78074016649885
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study small-loss bounds for adversarial multi-armed bandits with graph feedback, that is, adaptive regret bounds that depend on the loss of the best arm or related quantities, instead of the total number of rounds. We derive the first small-loss bound for general strongly observable graphs, resolving an open problem of Lykouris et al. (2018). Specifically, we develop an algorithm with regret $\mathcal{\tilde{O}}(\sqrt{\kappa L_*})$ where $\kappa$ is the clique partition number and $L_*$ is the loss of the best arm, and for the special case of self-aware graphs where every arm has a self-loop, we improve the regret to $\mathcal{\tilde{O}}(\min\{\sqrt{\alpha T}, \sqrt{\kappa L_*}\})$ where $\alpha \leq \kappa$ is the independence number. Our results significantly improve and extend those by Lykouris et al. (2018) who only consider self-aware undirected graphs. Furthermore, we also take the first attempt at deriving small-loss bounds for weakly observable graphs. We first prove that no typical small-loss bounds are achievable in this case, and then propose algorithms with alternative small-loss bounds in terms of the loss of some specific subset of arms. A surprising side result is that $\mathcal{\tilde{O}}(\sqrt{T})$ regret is achievable even for weakly observable graphs as long as the best arm has a self-loop. Our algorithms are based on the Online Mirror Descent framework but require a suite of novel techniques that might be of independent interest. Moreover, all our algorithms can be made parameter-free without the knowledge of the environment.
Abstract（参考訳）: 本研究では, 弾数ではなく, 最善の腕や関連量の損失に依存する適応的後悔境界を用いて, 敵の多腕バンディットに対する小損失境界について検討した。一般の強可観測グラフに対する最初の小さな空間境界を導出し、Lykouris et al. (2018) の開問題を解く。具体的には、後悔する$\mathcal{\tilde{O}}(\sqrt{\kappa L_*})$で、$\kappa$はcliqueパーティション数であり、$L_*$はベストアームの損失であり、各アームが自己ループを持つ特殊な自己認識グラフの場合、$\mathcal{\tilde{O}}(\min\{\sqrt{\alpha T}, \sqrt{\kappa L_*}\})$で、後悔する$\mathcal{\tilde{O}}(\min\{\sqrt{\alpha T}, \sqrt{\kappa L_*}\})$で、$\alpha \leq \kappa$は独立数である。我々の結果はLykouris et al. (2018) によって改善され拡張され、自己認識無向グラフのみを考える。さらに,弱可観測グラフに対する小損失境界を導出する最初の試みも行った。この場合、我々はまず、典型的な小損失境界が達成可能でないことを証明し、次に特定のアームのサブセットの損失という観点で、別の小損失境界を持つアルゴリズムを提案する。驚くべき結果として、$\mathcal{\tilde{o}}(\sqrt{t})$ regretは、最良のアームが自己ループを持つ限り、弱い可観測グラフでも達成可能である。当社のアルゴリズムはオンラインミラー Descent フレームワークをベースとしていますが,独立した興味を持つ可能性のある,新しいテクニックのスイートが必要です。さらに、我々のアルゴリズムは環境の知識を使わずにパラメータフリーにすることができる。

論文の概要: A Closer Look at Small-loss Bounds for Bandits with Graph Feedback

関連論文リスト