Fugu-MT 論文翻訳(概要): Steganographic Backdoor Attacks in NLP: Ultra-Low Poisoning and Defense Evasion

論文の概要: Steganographic Backdoor Attacks in NLP: Ultra-Low Poisoning and Defense Evasion

arxiv url: http://arxiv.org/abs/2511.14301v1
Date: Tue, 18 Nov 2025 09:56:16 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-19 16:23:53.040386
Title: Steganographic Backdoor Attacks in NLP: Ultra-Low Poisoning and Defense Evasion
Title（参考訳）: NLPにおけるステガノグラフィーバックドアアタック--超低地中毒と防御的侵入
Authors: Eric Xue, Ruiyi Zhang, Zijun Zhang, Pengtao Xie,
Abstract要約: トランスフォーマーモデルは自然言語処理(NLP)アプリケーションの基礎であるが、バックドア攻撃には弱い。我々はステガノBackdoorを導入し、ステルステクニックを実用的な脅威モデルに適合させる。 SteganoBackdoorの攻撃成功率は99%を超えている。
参考スコア（独自算出の注目度）: 33.35232947017276
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformer models are foundational to natural language processing (NLP) applications, yet remain vulnerable to backdoor attacks introduced through poisoned data, which implant hidden behaviors during training. To strengthen the ability to prevent such compromises, recent research has focused on designing increasingly stealthy attacks to stress-test existing defenses, pairing backdoor behaviors with stylized artifact or token-level perturbation triggers. However, this trend diverts attention from the harder and more realistic case: making the model respond to semantic triggers such as specific names or entities, where a successful backdoor could manipulate outputs tied to real people or events in deployed systems. Motivated by this growing disconnect, we introduce SteganoBackdoor, bringing stealth techniques back into line with practical threat models. Leveraging innocuous properties from natural-language steganography, SteganoBackdoor applies a gradient-guided data optimization process to transform semantic trigger seeds into steganographic carriers that embed a high backdoor payload, remain fluent, and exhibit no representational resemblance to the trigger. Across diverse experimental settings, SteganoBackdoor achieves over 99% attack success at an order-of-magnitude lower data-poisoning rate than prior approaches while maintaining unparalleled evasion against a comprehensive suite of data-level defenses. By revealing this practical and covert attack, SteganoBackdoor highlights an urgent blind spot in current defenses and demands immediate attention to adversarial data defenses and real-world threat modeling.
Abstract（参考訳）: トランスフォーマーモデルは、自然言語処理(NLP)アプリケーションの基礎となっているが、トレーニング中に隠れた振る舞いを埋め込む有毒データを通じて導入されたバックドア攻撃に弱いままである。このような妥協を防ぐ能力を強化するために、最近の研究は、既存の防御をストレステストするために、ますますステルスな攻撃をデザインすること、スタイリングされたアーティファクトやトークンレベルの摂動トリガーとバックドアの振る舞いをペアリングすることに焦点を当てている。モデルが特定の名前やエンティティなどのセマンティックトリガに応答するようにすることで、バックドアが成功すれば、実際の人やデプロイされたシステム内のイベントに結びついたアウトプットを操作できるようになる。 SteganoBackdoorを導入し、ステルステクニックを現実的な脅威モデルと一致させる。 SteganoBackdoorは、自然言語のステガノグラフィーから無害な特性を活用することで、セマンティックトリガーの種をステガノグラフィーキャリアに変換し、高いバックドアペイロードを埋め込み、流れを保ち、トリガーと表現上の類似性を示さないように、勾配誘導データ最適化プロセスを適用している。さまざまな実験的な設定の中で、SteganoBackdoorは、データレベルの総合的な防御スイートに対する非並列回避を維持しながら、従来のアプローチよりも高次のデータポゾンレートで99%以上の攻撃成功を達成する。この実用的で隠蔽的な攻撃を明らかにすることで、SteganoBackdoorは現在の防衛における緊急の盲点を浮き彫りにし、敵のデータ防衛と現実世界の脅威モデリングに直ちに注意を喚起する。

論文の概要: Steganographic Backdoor Attacks in NLP: Ultra-Low Poisoning and Defense Evasion

関連論文リスト