Fugu-MT 論文翻訳(概要): Defending Against Backdoor Attacks in Natural Language Generation

論文の概要: Defending Against Backdoor Attacks in Natural Language Generation

arxiv url: http://arxiv.org/abs/2106.01810v3
Date: Mon, 9 Oct 2023 15:55:36 GMT
ステータス: 翻訳完了
システム内更新日: 2023-10-13 17:33:09.681074
Title: Defending Against Backdoor Attacks in Natural Language Generation
Title（参考訳）: 自然言語生成におけるバックドア攻撃対策
Authors: Xiaofei Sun, Xiaoya Li, Yuxian Meng, Xiang Ao, Lingjuan Lyu, Jiwei Li and Tianwei Zhang
Abstract要約: バックドア攻撃と防衛の正式な定義を与えます。本研究では,機械翻訳とダイアログ生成という2つの重要なNLGタスクについて検討する。提案手法により,攻撃対象の逆方向の確率を検証した結果,全ての攻撃に対して効果的な防御性能が得られることがわかった。
参考スコア（独自算出の注目度）: 90.550383621687
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The frustratingly fragile nature of neural network models make current natural language generation (NLG) systems prone to backdoor attacks and generate malicious sequences that could be sexist or offensive. Unfortunately, little effort has been invested to how backdoor attacks can affect current NLG models and how to defend against these attacks. In this work, by giving a formal definition of backdoor attack and defense, we investigate this problem on two important NLG tasks, machine translation and dialog generation. Tailored to the inherent nature of NLG models (e.g., producing a sequence of coherent words given contexts), we design defending strategies against attacks. We find that testing the backward probability of generating sources given targets yields effective defense performance against all different types of attacks, and is able to handle the {\it one-to-many} issue in many NLG tasks such as dialog generation. We hope that this work can raise the awareness of backdoor risks concealed in deep NLG systems and inspire more future work (both attack and defense) towards this direction.
Abstract（参考訳）: ニューラルネットワークモデルの非常に脆弱な性質により、現在の自然言語生成(nlg)システムはバックドア攻撃を起こしやすくなり、セクシストや攻撃的な悪質なシーケンスを生成する。残念なことに、バックドア攻撃が現在のNLGモデルにどのように影響するか、そしてこれらの攻撃に対する防御方法にはほとんど投資されていない。本研究では,バックドア攻撃と防御の形式的定義を提供することで,機械翻訳とダイアログ生成という2つの重要なNLGタスクについて,この問題を考察する。 NLGモデルの本質的な性質(例えば、与えられたコンテキストのコヒーレントな単語列の生成)に照らして、攻撃に対する防御戦略を設計する。対象とする音源の後方方向の確率をテストすることで,全ての攻撃に対して効果的な防御性能が得られ,ダイアログ生成などの多くのNLGタスクにおいて,一対多の問題に対処できることがわかった。この取り組みは、深いNLGシステムに隠されたバックドアリスクの認識を高め、この方向に向けたより将来の作業(攻撃と防御の両方)を促すことを願っている。

論文の概要: Defending Against Backdoor Attacks in Natural Language Generation

関連論文リスト