Fugu-MT 論文翻訳(概要): Secure-UCB: Saving Stochastic Bandits from Poisoning Attacks via Limited Data Verification

論文の概要: Secure-UCB: Saving Stochastic Bandits from Poisoning Attacks via Limited Data Verification

arxiv url: http://arxiv.org/abs/2102.07711v1
Date: Mon, 15 Feb 2021 18:02:46 GMT
ステータス: 翻訳完了
システム内更新日: 2021-02-16 16:14:19.857262
Title: Secure-UCB: Saving Stochastic Bandits from Poisoning Attacks via Limited Data Verification
Title（参考訳）: Secure-UCB: 限定データ検証による攻撃からの確率的帯域保護
Authors: Anshuka Rangi, Long Tran-Thanh, Haifeng Xu, Massimo Franceschetti
Abstract要約: 我々は,攻撃者が選択した行動と対応する報酬の両方を観察し,付加的な雑音で報酬を汚染できる強力な攻撃モデルを考える。我々は、後悔$O(log T)$のエンファニーバンディットアルゴリズムは、期待される量の汚染で後悔$Omega(T)$に苦しむことを強制することができることを示しています。本稿では,未汚染報酬へのアクセスに限定的なアンフェリフィケーションを用いた新しいアルゴリズムSecure-UCBを提案する。
参考スコア（独自算出の注目度）: 37.107591129228695
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies bandit algorithms under data poisoning attacks in a bounded reward setting. We consider a strong attacker model in which the attacker can observe both the selected actions and their corresponding rewards, and can contaminate the rewards with additive noise. We show that \emph{any} bandit algorithm with regret $O(\log T)$ can be forced to suffer a regret $\Omega(T)$ with an expected amount of contamination $O(\log T)$. This amount of contamination is also necessary, as we prove that there exists an $O(\log T)$ regret bandit algorithm, specifically the classical UCB, that requires $\Omega(\log T)$ amount of contamination to suffer regret $\Omega(T)$. To combat such poising attacks, our second main contribution is to propose a novel algorithm, Secure-UCB, which uses limited \emph{verification} to access a limited number of uncontaminated rewards. We show that with $O(\log T)$ expected number of verifications, Secure-UCB can restore the order optimal $O(\log T)$ regret \emph{irrespective of the amount of contamination} used by the attacker. Finally, we prove that for any bandit algorithm, this number of verifications $O(\log T)$ is necessary to recover the order-optimal regret. We can then conclude that Secure-UCB is order-optimal in terms of both the expected regret and the expected number of verifications, and can save stochastic bandits from any data poisoning attack.
Abstract（参考訳）: 本稿では,データ中毒攻撃時のバンディットアルゴリズムについて検討する。我々は,攻撃者が選択した行動と対応する報酬の両方を観察し,付加的な雑音で報酬を汚染できる強力な攻撃モデルを考える。我々は、後悔 $O(\log T)$ を持つ \emph{any} bandit アルゴリズムが、期待される汚染量 $O(\log T)$ で後悔 $\Omega(T)$ に苦しむことを強制できることを示した。この量の汚染も必要であり、後悔する$O(\log T)$ バンディットアルゴリズム、特に古典的 UCB が存在し、後悔する$Omega(\log T)$ の汚染に苦しむために$Omega(\log T)$ の汚染量を必要とすることを証明している。このような攻撃に対抗するために、2つ目の主な貢献は、限定的な \emph{verification} を使用して汚染されていない報酬の限られた数にアクセスする新しいアルゴリズム Secure-UCB を提案することです。 Secure-UCBは,$O(\log T)$期待数の検証を行うと,攻撃者が使用する汚染量の最小値として,最適な$O(\log T)$ regret \emph{irrespectiveを復元できることを示す。最後に、任意の帯域幅アルゴリズムに対して、この検証数$O(\log T)$は順序最適後悔を取り戻すのに必要であることを示す。次に、Secure-UCBは期待される後悔と予想される検証数の両方の観点から順序最適であり、あらゆるデータ中毒攻撃から確率的なバンディットを救えると結論付けることができます。

関連論文リスト

Stochastic Bandits Robust to Adversarial Attacks [33.278131584647745]
本稿では,敵攻撃に対して頑健なマルチアームバンディットアルゴリズムについて検討する。我々は、攻撃予算の知識の有無に関わらず、このモデルの2つのケースを調査する。我々は、加法的あるいは乗法的な$C$依存項を持つ後悔境界を持つ2種類のアルゴリズムを考案する。
論文参考訳（メタデータ） (2024-08-16T17:41:35Z)
Combinatorial Neural Bandits [10.463365653675694]
学習エージェントが各ラウンドでアームのサブセットを選択し、そのスコアに応じて選択したアームのフィードバックを受け取るというコンテキスト的盗聴問題を考える。アルゴリズムを提案する: Combinatorial Neural UCB(textttCN-UCB)と Combinatorial Thompson Sampling(textttCN-TS$)。
論文参考訳（メタデータ） (2023-05-31T23:27:58Z)
Communication-Constrained Bandits under Additive Gaussian Noise [111.06688156723018]
クライアントが学習者にコミュニケーション制約のあるフィードバックを提供する分散マルチアームバンディットについて検討する。我々は、この下限を小さな加法係数にマッチさせるマルチフェーズ帯域幅アルゴリズム、$mathtUEtext-UCB++$を提案する。
論文参考訳（メタデータ） (2023-04-25T09:31:20Z)
Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR [58.40575099910538]
本研究は,リスク許容度が$tau$のCVaR(Conditional Value at Risk)の目的に着目し,リスクに敏感な強化学習(RL)について検討する。ミニマックスCVaRの後悔率は$Omega(sqrttau-1AK)$で、$A$はアクションの数、$K$はエピソード数である。我々は,このアルゴリズムが連続性仮定の下で$widetilde O(tau-1sqrtSAK)$の最適後悔を達成し,一般に近似することを示す。
論文参考訳（メタデータ） (2023-02-07T02:22:31Z)
Complete Policy Regret Bounds for Tallying Bandits [51.039677652803675]
政策後悔は、適応的な敵に対してオンライン学習アルゴリズムのパフォーマンスを測定するという、よく確立された概念である。我々は,不完全な政策後悔を効果的に最小化できる敵の制限について検討する。我々は、$tildemathcalO(mKsqrtT)$の完全なポリシーを後悔するアルゴリズムを提供し、$tildemathcalO$表記は対数要素だけを隠す。
論文参考訳（メタデータ） (2022-04-24T03:10:27Z)
Sleeping Combinatorial Bandits [15.004764451770441]
睡眠帯域設定においてよく知られたCUCBアルゴリズムを適用し,それをCSUCBと呼ぶ。穏やかな条件下では、CSUCBアルゴリズムが$O(sqrtT log (T)$ instance-dependent regret guaranteeを達成することを証明します。我々の結果は極めて一般的なものであり、非付加的な報酬関数、揮発性アームの可用性、引くべきベースアームの変動数など、実用的な応用によって生じる一般的な環境下で維持されます。
論文参考訳（メタデータ） (2021-06-03T06:49:44Z)
Probabilistic Sequential Shrinking: A Best Arm Identification Algorithm for Stochastic Bandits with Corruptions [91.8283876874947]
我々は,T段の固定予算設定において,敵対的腐敗を伴う連続的包帯に対する最適な腕識別(BAI)問題を考察した。我々は, 汚職の量に依存しない新しいランダム化アルゴリズム, Probabilistic Shrinking($u$) (PSS($u$)) を設計する。 CPS が十分に大きいとき、BAI 確率を$Trightarrow infty$ として達成できるアルゴリズムは存在しない。
論文参考訳（メタデータ） (2020-10-15T17:34:26Z)
Impact of Representation Learning in Linear Bandits [83.17684841392754]
本研究では,表現学習が帯域幅問題の効率性を向上させる方法について検討する。我々は,$widetildeO(TsqrtkN + sqrtdkNT)$ regretを達成する新しいアルゴリズムを提案する。
論文参考訳（メタデータ） (2020-10-13T16:35:30Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
制約付き文脈線形帯域設定について検討し、エージェントの目標は一連のポリシーを作成することである。楽観的悲観的線形帯域(OPLB)と呼ばれる,この問題に対する高信頼束縛アルゴリズムを提案する。
論文参考訳（メタデータ） (2020-06-17T22:32:19Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。