Fugu-MT 論文翻訳(概要): Rethinking Textual Adversarial Defense for Pre-trained Language Models

論文の概要: Rethinking Textual Adversarial Defense for Pre-trained Language Models

arxiv url: http://arxiv.org/abs/2208.10251v1
Date: Thu, 21 Jul 2022 07:51:45 GMT
ステータス: 翻訳完了
システム内更新日: 2022-08-28 22:34:15.960653
Title: Rethinking Textual Adversarial Defense for Pre-trained Language Models
Title（参考訳）: 事前学習型言語モデルにおけるテキスト・アドバイザラル・ディフェンスの再考
Authors: Jiayi Wang, Rongzhou Bao, Zhuosheng Zhang, Hai Zhao
Abstract要約: 文献レビューでは、事前訓練された言語モデル(PrLM)が敵の攻撃に弱いことが示されている。本稿では、現在の敵攻撃アプローチにより、より自然で知覚不能な敵の例を生成するための新しい指標(異常の度合い)を提案する。我々は,我々のユニバーサル・ディフェンス・フレームワークが,他の特定のディフェンスと同等あるいはそれ以上のアフターアタック・ディフェンスの精度を達成することを示す。
参考スコア（独自算出の注目度）: 79.18455635071817
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although pre-trained language models (PrLMs) have achieved significant success, recent studies demonstrate that PrLMs are vulnerable to adversarial attacks. By generating adversarial examples with slight perturbations on different levels (sentence / word / character), adversarial attacks can fool PrLMs to generate incorrect predictions, which questions the robustness of PrLMs. However, we find that most existing textual adversarial examples are unnatural, which can be easily distinguished by both human and machine. Based on a general anomaly detector, we propose a novel metric (Degree of Anomaly) as a constraint to enable current adversarial attack approaches to generate more natural and imperceptible adversarial examples. Under this new constraint, the success rate of existing attacks drastically decreases, which reveals that the robustness of PrLMs is not as fragile as they claimed. In addition, we find that four types of randomization can invalidate a large portion of textual adversarial examples. Based on anomaly detector and randomization, we design a universal defense framework, which is among the first to perform textual adversarial defense without knowing the specific attack. Empirical results show that our universal defense framework achieves comparable or even higher after-attack accuracy with other specific defenses, while preserving higher original accuracy at the same time. Our work discloses the essence of textual adversarial attacks, and indicates that (1) further works of adversarial attacks should focus more on how to overcome the detection and resist the randomization, otherwise their adversarial examples would be easily detected and invalidated; and (2) compared with the unnatural and perceptible adversarial examples, it is those undetectable adversarial examples that pose real risks for PrLMs and require more attention for future robustness-enhancing strategies.
Abstract（参考訳）: プレトレーニング言語モデル(PrLM)は大きな成功を収めているが、最近の研究では、PrLMは敵の攻撃に弱いことが示されている。異なるレベル(文/単語/文字)でわずかな摂動を持つ敵の例を生成することで、敵の攻撃はPrLMを騙して誤った予測を生成し、PrLMの堅牢性に疑問を投げかける。しかし、既存のテキストの逆数例のほとんどは不自然なものであり、人間と機械の両方で容易に区別できる。一般的な異常検知器をベースとして,現在の敵攻撃アプローチがより自然で知覚不能な敵の例を生成するための制約として,新しい指標(異常度)を提案する。この新たな制約の下では、既存の攻撃の成功率は劇的に減少し、PrLMsの堅牢性は、彼らが主張するほど脆弱ではないことが明らかになった。さらに, 4種類のランダム化は, テキスト対逆例の大部分を無効化できることがわかった。異常検出とランダム化に基づいて,本手法は,特定の攻撃を知らずにテキスト対角防御を初めて行うユニバーサル・ディフェンス・フレームワークを設計する。経験的な結果から,我々のユニバーサルディフェンスフレームワークは,他の特定のディフェンスと同等あるいはそれ以上のアフターアタック精度を達成でき,同時に元の精度も維持できることがわかった。 Our work discloses the essence of textual adversarial attacks, and indicates that (1) further works of adversarial attacks should focus more on how to overcome the detection and resist the randomization, otherwise their adversarial examples would be easily detected and invalidated; and (2) compared with the unnatural and perceptible adversarial examples, it is those undetectable adversarial examples that pose real risks for PrLMs and require more attention for future robustness-enhancing strategies.

論文の概要: Rethinking Textual Adversarial Defense for Pre-trained Language Models

関連論文リスト