Fugu-MT 論文翻訳(概要): Adversarial Robustness of In-Context Learning in Transformers for Linear Regression

論文の概要: Adversarial Robustness of In-Context Learning in Transformers for Linear Regression

arxiv url: http://arxiv.org/abs/2411.05189v1
Date: Thu, 07 Nov 2024 21:25:58 GMT
ステータス: 翻訳完了
システム内更新日: 2024-11-28 17:07:45.604415
Title: Adversarial Robustness of In-Context Learning in Transformers for Linear Regression
Title（参考訳）: 線形回帰用変圧器におけるインテクスト学習の逆ロバスト性
Authors: Usman Anwar, Johannes Von Oswald, Louis Kirsch, David Krueger, Spencer Frei,
Abstract要約: 本研究は,線形回帰タスクの設定に焦点をあてたテキストハイザック攻撃に対するトランスフォーマにおける文脈内学習の脆弱性について検討する。まず,一層線形変圧器が非破壊的であり,任意の予測を出力できることを示す。次に, 逆行訓練は, ファインタニング時にのみ適用しても, ハイジャック攻撃に対するトランスフォーマーの堅牢性を高めることを実証する。
参考スコア（独自算出の注目度）: 23.737606860443705
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformers have demonstrated remarkable in-context learning capabilities across various domains, including statistical learning tasks. While previous work has shown that transformers can implement common learning algorithms, the adversarial robustness of these learned algorithms remains unexplored. This work investigates the vulnerability of in-context learning in transformers to \textit{hijacking attacks} focusing on the setting of linear regression tasks. Hijacking attacks are prompt-manipulation attacks in which the adversary's goal is to manipulate the prompt to force the transformer to generate a specific output. We first prove that single-layer linear transformers, known to implement gradient descent in-context, are non-robust and can be manipulated to output arbitrary predictions by perturbing a single example in the in-context training set. While our experiments show these attacks succeed on linear transformers, we find they do not transfer to more complex transformers with GPT-2 architectures. Nonetheless, we show that these transformers can be hijacked using gradient-based adversarial attacks. We then demonstrate that adversarial training enhances transformers' robustness against hijacking attacks, even when just applied during finetuning. Additionally, we find that in some settings, adversarial training against a weaker attack model can lead to robustness to a stronger attack model. Lastly, we investigate the transferability of hijacking attacks across transformers of varying scales and initialization seeds, as well as between transformers and ordinary least squares (OLS). We find that while attacks transfer effectively between small-scale transformers, they show poor transferability in other scenarios (small-to-large scale, large-to-large scale, and between transformers and OLS).
Abstract（参考訳）: トランスフォーマーは、統計的学習タスクを含む、様々な領域にわたる顕著なコンテキスト内学習能力を示してきた。これまでの研究では、トランスフォーマーが共通の学習アルゴリズムを実装できることが示されていたが、これらの学習アルゴリズムの逆方向の堅牢性は未解明のままである。本研究は,線形回帰タスクの設定に焦点をあてて,変換器から‘textit{hijacking attack’へのコンテキスト内学習の脆弱性について検討する。ハイジャック攻撃(英: Hijacking attack)とは、相手のゴールがプロンプトを操作してトランスフォーマーに特定の出力を強制するプロンプトである。まず,一層線形変圧器が非破壊的であり,一層線形変圧器を用いて任意の予測を出力できることを示す。実験の結果,これらの攻撃は線形変圧器で成功することが示されたが,GPT-2アーキテクチャの複雑な変圧器には適用されないことがわかった。しかし,これらの変圧器は,勾配に基づく対向攻撃によってハイジャック可能であることを示す。次に, 逆行訓練は, ファインタニング時にのみ適用しても, ハイジャック攻撃に対するトランスフォーマーの堅牢性を高めることを実証する。さらに、いくつかの設定では、より弱い攻撃モデルに対する敵の訓練は、より強力な攻撃モデルに堅牢性をもたらす可能性がある。最後に,様々なスケールの変圧器と初期化種子,および変圧器と通常最小方形(OLS)間のハイジャック攻撃の伝達性について検討した。攻撃は小型変圧器間で効果的に伝達されるが、他のシナリオ(小型・大規模・大規模・大規模・変圧器・OLS間)では伝達性が低いことが判明した。

論文の概要: Adversarial Robustness of In-Context Learning in Transformers for Linear Regression

関連論文リスト