Fugu-MT 論文翻訳(概要): PASTA: A Patch-Agnostic Twofold-Stealthy Backdoor Attack on Vision Transformers

論文の概要: PASTA: A Patch-Agnostic Twofold-Stealthy Backdoor Attack on Vision Transformers

arxiv url: http://arxiv.org/abs/2604.20047v1
Date: Tue, 21 Apr 2026 23:04:49 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-23 15:36:10.883185
Title: PASTA: A Patch-Agnostic Twofold-Stealthy Backdoor Attack on Vision Transformers
Title（参考訳）: PASTA: 視覚変換器の2段階のバックドア攻撃
Authors: Dazhuang Liu, Yanqi Qiao, Rui Wang, Kaitai Liang, Georgios Smaragdakis,
Abstract要約: パッチワイズトリガーは、近隣のパッチにまたがるバックドアを活性化する際に、高い攻撃効果が得られることを観察する。 PASTAは,画素領域と注目領域の両方において,2倍のステルス性パッチワイドバックドアアタックである。
参考スコア（独自算出の注目度）: 10.045003770844842
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vision Transformers (ViTs) have achieved remarkable success across vision tasks, yet recent studies show they remain vulnerable to backdoor attacks. Existing patch-wise attacks typically assume a single fixed trigger location during inference to maximize trigger attention. However, they overlook the self-attention mechanism in ViTs, which captures long-range dependencies across patches. In this work, we observe that a patch-wise trigger can achieve high attack effectiveness when activating backdoors across neighboring patches, a phenomenon we term the Trigger Radiating Effect (TRE). We further find that inter-patch trigger insertion during training can synergistically enhance TRE compared to single-patch insertion. Prior ViT-specific attacks that maximize trigger attention often sacrifice visual and attention stealthiness, making them detectable. Based on these insights, we propose PASTA, a twofold stealthy patch-wise backdoor attack in both pixel and attention domains. PASTA enables backdoor activation when the trigger is placed at arbitrary patches during inference. To achieve this, we introduce a multi-location trigger insertion strategy to enhance TRE. However, preserving stealthiness while maintaining strong TRE is challenging, as TRE is weakened under stealthy constraints. We therefore formulate a bi-level optimization problem and propose an adaptive backdoor learning framework, where the model and trigger iteratively adapt to each other to avoid local optima. Extensive experiments show that PASTA achieves 99.13% attack success rate across arbitrary patches on average, while significantly improving visual and attention stealthiness (144.43x and 18.68x) and robustness (2.79x) against state-of-the-art ViT defenses across four datasets, outperforming CNN- and ViT-based baselines.
Abstract（参考訳）: ビジョントランスフォーマー(ViT)は、視覚タスク全体で大きな成功を収めていますが、最近の研究では、バックドア攻撃に対して脆弱であることが示されています。既存のパッチワイズ攻撃は、通常、推論中に1つの固定されたトリガー位置を仮定して、トリガー注意を最大化する。しかし、パッチ間の長距離依存関係をキャプチャするViTの自己アテンションメカニズムを見落としている。そこで本研究では,Trigger Radiating Effect (TRE) と呼ばれる,隣接するパッチにまたがるバックドアの活性化において,パッチワイドトリガーが高い攻撃効果を達成できることを示す。さらに、トレーニング中のパッチ間インサートは、単一パッチインサートと比較してTREを相乗的に増強できることがわかった。注意を喚起するViT固有の攻撃は、しばしば視覚的および注意の盗みを犠牲にし、それらを検出する。これらの知見に基づき、PASTAは、画素領域とアテンション領域の両方において、2倍のステルス性パッチワイドバックドアアタックである。 PASTAは、推論中にトリガーが任意のパッチに置かれると、バックドアのアクティベーションを可能にする。これを実現するために,TREを強化するマルチロケーショントリガ挿入戦略を導入する。しかし、TREがステルス性制約の下で弱まるため、強いTREを維持しながらステルス性を維持することは困難である。そこで我々は,二段階最適化問題を定式化し,モデルとトリガが相互に適応し,局所最適化を避けるための適応的なバックドア学習フレームワークを提案する。大規模な実験により、PASTAは平均して任意のパッチに対して99.13%の攻撃成功率を達成する一方で、視覚的および注意的ステルスネス(144.43xと18.68x)と、4つのデータセットにわたる最先端のViT防御に対する堅牢性(2.79x)を大幅に改善し、CNNとViTベースのベースラインを上回った。

論文の概要: PASTA: A Patch-Agnostic Twofold-Stealthy Backdoor Attack on Vision Transformers

関連論文リスト