Fugu-MT 論文翻訳(概要): A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian Process Bandits

論文の概要: A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian Process Bandits

arxiv url: http://arxiv.org/abs/2202.01850v1
Date: Thu, 3 Feb 2022 21:19:36 GMT
ステータス: 翻訳完了
システム内更新日: 2022-02-07 14:07:53.124349
Title: A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian Process Bandits
Title（参考訳）: 汚損耐性ガウス過程バンドに対するロバスト位相除去アルゴリズム
Authors: Ilija Bogunovic, Zihan Li, Andreas Krause, Jonathan Scarlett
Abstract要約: そこで本稿では,エポックで動作するロバストな除去型アルゴリズムを提案し,探索と頻繁な切替を併用して,小さなアクションサブセットを選択し,各アクションを複数タイミングで実行する。我々のアルゴリズムであるGP Robust Phased Elimination (RGP-PE) は、探索とエクスプロイトによる汚職に対するロバストネスのバランスに成功している。 GPバンディット設定におけるロバスト性の最初の実証的研究を行い,アルゴリズムが様々な敵攻撃に対してロバストであることを示す。
参考スコア（独自算出の注目度）: 118.22458816174144
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the sequential optimization of an unknown, continuous, and expensive to evaluate reward function, from noisy and adversarially corrupted observed rewards. When the corruption attacks are subject to a suitable budget $C$ and the function lives in a Reproducing Kernel Hilbert Space (RKHS), the problem can be posed as corrupted Gaussian process (GP) bandit optimization. We propose a novel robust elimination-type algorithm that runs in epochs, combines exploration with infrequent switching to select a small subset of actions, and plays each action for multiple time instants. Our algorithm, Robust GP Phased Elimination (RGP-PE), successfully balances robustness to corruptions with exploration and exploitation such that its performance degrades minimally in the presence (or absence) of adversarial corruptions. When $T$ is the number of samples and $\gamma_T$ is the maximal information gain, the corruption-dependent term in our regret bound is $O(C \gamma_T^{3/2})$, which is significantly tighter than the existing $O(C \sqrt{T \gamma_T})$ for several commonly-considered kernels. We perform the first empirical study of robustness in the corrupted GP bandit setting, and show that our algorithm is robust against a variety of adversarial attacks.
Abstract（参考訳）: 雑音や逆向きに破損した観測報酬から、報酬関数を評価するために、未知、連続、費用の連続的な最適化を検討する。汚職攻撃が適切な予算$C$で、その関数が再生ケルネルヒルベルト空間(RKHS)に存在する場合、その問題はガウス過程(GP)帯域最適化として表される。本研究では,エポックで動作し,頻繁なスイッチングと組み合わせて少数のアクションを選択し,各アクションを複数のタイミングで再生する,新しいロバストな除去型アルゴリズムを提案する。我々のアルゴリズムであるRobust GP Phased Elimination (RGP-PE) は、敵の汚職の存在(または欠如)においてその性能が最小限に低下するように、探索と悪用によって汚職に対する堅牢性をバランスさせることに成功した。 T$ がサンプル数であり、$\gamma_T$ が最大情報ゲインであるとき、我々の後悔境界における汚職依存項は $O(C \gamma_T^{3/2})$ であり、これはいくつかの一般的なカーネルに対して既存の $O(C \sqrt{T \gamma_T})$ よりもかなり厳密である。 GPバンディット設定におけるロバスト性の最初の実証的研究を行い,アルゴリズムが様々な敵攻撃に対してロバストであることを示す。

論文の概要: A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian Process Bandits

関連論文リスト