Fugu-MT 論文翻訳(概要): Stochastic Bandits Robust to Adversarial Attacks

論文の概要: Stochastic Bandits Robust to Adversarial Attacks

arxiv url: http://arxiv.org/abs/2408.08859v1
Date: Fri, 16 Aug 2024 17:41:35 GMT
ステータス: 翻訳完了
システム内更新日: 2024-08-19 14:43:58.079255
Title: Stochastic Bandits Robust to Adversarial Attacks
Title（参考訳）: 確率的帯域幅は敵攻撃にロバスト
Authors: Xuchuang Wang, Jinhang Zuo, Xutong Liu, John C. S. Lui, Mohammad Hajiesmaili,
Abstract要約: 本稿では,敵攻撃に対して頑健なマルチアームバンディットアルゴリズムについて検討する。我々は、攻撃予算の知識の有無に関わらず、このモデルの2つのケースを調査する。我々は、加法的あるいは乗法的な$C$依存項を持つ後悔境界を持つ2種類のアルゴリズムを考案する。
参考スコア（独自算出の注目度）: 33.278131584647745
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper investigates stochastic multi-armed bandit algorithms that are robust to adversarial attacks, where an attacker can first observe the learner's action and {then} alter their reward observation. We study two cases of this model, with or without the knowledge of an attack budget $C$, defined as an upper bound of the summation of the difference between the actual and altered rewards. For both cases, we devise two types of algorithms with regret bounds having additive or multiplicative $C$ dependence terms. For the known attack budget case, we prove our algorithms achieve the regret bound of ${O}((K/\Delta)\log T + KC)$ and $\tilde{O}(\sqrt{KTC})$ for the additive and multiplicative $C$ terms, respectively, where $K$ is the number of arms, $T$ is the time horizon, $\Delta$ is the gap between the expected rewards of the optimal arm and the second-best arm, and $\tilde{O}$ hides the logarithmic factors. For the unknown case, we prove our algorithms achieve the regret bound of $\tilde{O}(\sqrt{KT} + KC^2)$ and $\tilde{O}(KC\sqrt{T})$ for the additive and multiplicative $C$ terms, respectively. In addition to these upper bound results, we provide several lower bounds showing the tightness of our bounds and the optimality of our algorithms. These results delineate an intrinsic separation between the bandits with attacks and corruption models [Lykouris et al., 2018].
Abstract（参考訳）: 本稿では,攻撃者がまず学習者の行動を観察し,その報奨の観察を変更する確率的マルチアームバンディットアルゴリズムについて検討する。本モデルでは、実際の報酬と変更報酬の差の和の上限として定義された攻撃予算$C$の知識の有無にかかわらず、2つのケースについて検討する。どちらの場合も、加法的あるいは乗法的な$C$依存項を持つ後悔境界を持つ2種類のアルゴリズムを考案する。既知の攻撃予算の場合、我々のアルゴリズムが${O}((K/\Delta)\log T + KC)$と$\tilde{O}(\sqrt{KTC})$をそれぞれ加法的および乗法的な$C$の項で残すことを証明している。未知の場合、加法および乗法的な$C$に対して、我々のアルゴリズムが $\tilde{O}(\sqrt{KT} + KC^2)$ と $\tilde{O}(KC\sqrt{T})$ の後悔境界を達成することを証明する。これらの上界結果に加えて、境界の厳密さとアルゴリズムの最適性を示す下界もいくつか提供する。これらの結果は,攻撃モデルと汚職モデルによる盗賊の本質的な分離を浮き彫りにする[Lykouris et al , 2018]。

関連論文リスト

Efficient and Optimal Policy Gradient Algorithm for Corrupted Multi-armed Bandits [29.845787788972594]
我々はSAMBAが最先端の$O(Klog T/Delta) + O(C/Delta)$ regret upper boundを達成することを示す。また,SAMBAの有効性を実証するためにシミュレーションを行った。
論文参考訳（メタデータ） (2025-02-19T23:16:18Z)
Achieving Optimal Breakdown for Byzantine Robust Gossip [15.69624587054777]
本稿では,デバイス同士が直接通信する分散環境でのビザンチン耐性アルゴリズムについて検討する。我々は,$mathrmClippedGossip$と$mathrmNNA$の交差点におけるアルゴリズムである$mathrmCG+$を紹介した。
論文参考訳（メタデータ） (2024-10-14T12:10:52Z)
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
人からのフィードバックから学ぶことは、大言語モデル(LLM)のような生成モデルを調整する上で重要な役割を果たす本稿では,このドメイン内のモデルについて考察する。-文脈的デュエルバンディット(contextual dueling bandits)と,正の選好ラベルを相手によって反転させることができる対向フィードバック(reversarial feedback)について考察する。本稿では,不確実性重み付き最大推定に基づく頑健なコンテキストデュエルバンドイット(RCDB)を提案する。
論文参考訳（メタデータ） (2024-04-16T17:59:55Z)
Best-of-Both-Worlds Algorithms for Linear Contextual Bandits [11.94312915280916]
両世界のベスト・オブ・ワールドズ・アルゴリズムを$K$武器付き線形文脈包帯に対して検討する。我々のアルゴリズムは、敵対的体制と敵対的体制の両方において、ほぼ最適の後悔の限界を提供する。
論文参考訳（メタデータ） (2023-12-24T08:27:30Z)
Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
我々は,選択した$n$のアームセットのジョイント報酬以外の余分な情報が観測されない場合に,マルチアームのバンディット問題に対する新規グリーディ・バンディット(SGB)アルゴリズムを提案する。 SGBは最適化された拡張型コミットアプローチを採用しており、ベースアームの大きなセットを持つシナリオ用に特別に設計されている。
論文参考訳（メタデータ） (2023-12-13T11:08:25Z)
Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits [99.86860277006318]
本稿では,一組の逆アルゴリズムを組み合わせ,学習することの問題点について考察する。 Agarwal et al. の CORRAL はこの目標を、$widetildeO(sqrtd S T)$ の残酷なオーバーヘッドで達成している。この問題に触発されて、後悔のオーバーヘッドが百万ドルにしか依存しない大規模バンディットアルゴリズムのバンドを囲む新しいレシピを提案する。
論文参考訳（メタデータ） (2022-02-12T21:55:44Z)
Linear Contextual Bandits with Adversarial Corruptions [91.38793800392108]
本稿では,敵対的腐敗の存在下での線形文脈的包帯問題について検討する。逆汚染レベルに適応する分散認識アルゴリズムをC$で提案する。
論文参考訳（メタデータ） (2021-10-25T02:53:24Z)
Impact of Representation Learning in Linear Bandits [83.17684841392754]
本研究では,表現学習が帯域幅問題の効率性を向上させる方法について検討する。我々は,$widetildeO(TsqrtkN + sqrtdkNT)$ regretを達成する新しいアルゴリズムを提案する。
論文参考訳（メタデータ） (2020-10-13T16:35:30Z)
Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
我々はロバスト位相除去アルゴリズムの2つの変種を提供し、その1つは$C$を知っており、もう1つはそうでない。いずれの変種も、倒壊しない場合には、それぞれ$C = 0$ となり、それぞれ追加の加法項が生じる。文脈的設定では、単純な欲求的アルゴリズムは、明示的な探索を行わず、C$を知らないにもかかわらず、ほぼ最適加法的後悔項で証明可能な堅牢性を示す。
論文参考訳（メタデータ） (2020-07-07T09:00:57Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
制約付き文脈線形帯域設定について検討し、エージェントの目標は一連のポリシーを作成することである。楽観的悲観的線形帯域(OPLB)と呼ばれる,この問題に対する高信頼束縛アルゴリズムを提案する。
論文参考訳（メタデータ） (2020-06-17T22:32:19Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。