Fugu-MT 論文翻訳(概要): Targeted Downstream-Agnostic Attack

論文の概要: Targeted Downstream-Agnostic Attack

arxiv url: http://arxiv.org/abs/2605.19446v1
Date: Tue, 19 May 2026 07:00:36 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-20 15:03:09.174714
Title: Targeted Downstream-Agnostic Attack
Title（参考訳）: 下流からの攻撃を標的とした攻撃
Authors: Zhuxin Lei, Ziyuan Yang, Yi Zhang,
Abstract要約: 訓練済みエンコーダは、ダウンストリーム・アグノースティック・アタック(DAA)に脆弱である本稿では,攻撃者がターゲットとして選択した「脅威画像」と呼ばれる新しいコンポーネントを紹介する。脅威画像を機能レベルのアンカーとして活用することにより,被害者エンコーダの脆弱性を明らかにするためにタスク非依存のブリッジを構築する。
参考スコア（独自算出の注目度）: 5.00483763729881
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, pre-trained encoders have gained widespread use due to their strong capability in representation extraction. However, they are vulnerable to downstream-agnostic attacks (DAAs). Existing DAA methods operate under a permissive threat model, where an attack is successful if the generated downstream-agnostic adversarial examples (DAEs) change the original prediction, without requiring a specific target. In this paper, we propose a Targeted DAA (TDAA) method under a stricter threat model requiring the attack to be both targeted and downstream-agnostic. Since the downstream task is unknown and encoders do not directly produce predictions, achieving a targeted attack is particularly challenging. To address this, we introduce a novel component termed the 'threat image', pre-selected by the attacker as the target. Specifically, a generator is designed to produce example-specific adversarial perturbations that compel the victim encoder to output identical features for both the DAEs and the threat image. Unlike previous DAA methods that generate a single shared perturbation for all samples, which often fails due to image diversity, our method adopts an example-specific paradigm. This generates tailored perturbations for each image to ensure a high attack success rate and invisibility. By leveraging the threat image as a feature-level anchor, our method builds a task-agnostic bridge to reveal the vulnerabilities of the victim encoder. Extensive experiments on 10 self-supervised methods across 3 benchmark datasets demonstrate the effectiveness of our approach and reveal the pronounced vulnerability of pre-trained encoders. The code will be made publicly available after the review period.
Abstract（参考訳）: 近年, 事前学習エンコーダは, 表現抽出能力の強いため, 広く利用されている。しかし、ダウンストリーム・アグノースティック・アタック(DAA)に弱い。既存のDAA手法はパーミッシブ脅威モデルの下で動作し、生成された下流非依存の敵例(DAE)が特定の目標を必要とせず、元の予測を変更すると攻撃が成功する。本稿では,攻撃対象と下流の双方に依存しないような厳密な脅威モデルに基づくターゲットDAA(TDAA)手法を提案する。下流タスクは未知であり、エンコーダは直接予測を生成できないため、ターゲット攻撃を達成することは特に困難である。この問題に対処するため,攻撃者が予め選択した「脅威画像」と呼ばれる新しいコンポーネントを紹介した。具体的には、ジェネレータは、被害者エンコーダにDAEと脅威画像の両方で同一の特徴を出力するように強制する、サンプル固有の対向的摂動を生成するように設計されている。全てのサンプルに対して単一の共用摂動を生成する従来のDAA法とは異なり、画像の多様性のためにしばしば失敗するが、本手法ではサンプル固有のパラダイムを採用する。これにより、画像ごとに調整された摂動が発生し、高い攻撃成功率と可視性を確保する。脅威画像を機能レベルのアンカーとして活用することにより,被害者エンコーダの脆弱性を明らかにするためにタスク非依存のブリッジを構築する。 3つのベンチマークデータセットにまたがる10の自己教師型手法に関する大規模な実験は、我々のアプローチの有効性を示し、事前訓練されたエンコーダの明らかな脆弱性を明らかにする。コードはレビュー期間終了後に公開される予定だ。

論文の概要: Targeted Downstream-Agnostic Attack

関連論文リスト