Fugu-MT 論文翻訳(概要): Watermarking Text Generated by Black-Box Language Models

論文の概要: Watermarking Text Generated by Black-Box Language Models

arxiv url: http://arxiv.org/abs/2305.08883v1
Date: Sun, 14 May 2023 07:37:33 GMT
ステータス: 翻訳完了
システム内更新日: 2023-05-17 17:39:58.775115
Title: Watermarking Text Generated by Black-Box Language Models
Title（参考訳）: ブラックボックス言語モデルによるテキストの透かし
Authors: Xi Yang, Kejiang Chen, Weiming Zhang, Chang Liu, Yuang Qi, Jie Zhang, Han Fang, Nenghai Yu
Abstract要約: テキスト生成中に透かしを埋め込むことのできるホワイトボックスLCMに対して,透かしに基づく手法が提案されている。リストを認識した検出アルゴリズムは、透かし付きテキストを識別することができる。我々はブラックボックス言語モデル利用シナリオのための透かしフレームワークを開発する。
参考スコア（独自算出の注目度）: 103.52541557216766
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LLMs now exhibit human-like skills in various fields, leading to worries about misuse. Thus, detecting generated text is crucial. However, passive detection methods are stuck in domain specificity and limited adversarial robustness. To achieve reliable detection, a watermark-based method was proposed for white-box LLMs, allowing them to embed watermarks during text generation. The method involves randomly dividing the model vocabulary to obtain a special list and adjusting the probability distribution to promote the selection of words in the list. A detection algorithm aware of the list can identify the watermarked text. However, this method is not applicable in many real-world scenarios where only black-box language models are available. For instance, third-parties that develop API-based vertical applications cannot watermark text themselves because API providers only supply generated text and withhold probability distributions to shield their commercial interests. To allow third-parties to autonomously inject watermarks into generated text, we develop a watermarking framework for black-box language model usage scenarios. Specifically, we first define a binary encoding function to compute a random binary encoding corresponding to a word. The encodings computed for non-watermarked text conform to a Bernoulli distribution, wherein the probability of a word representing bit-1 being approximately 0.5. To inject a watermark, we alter the distribution by selectively replacing words representing bit-0 with context-based synonyms that represent bit-1. A statistical test is then used to identify the watermark. Experiments demonstrate the effectiveness of our method on both Chinese and English datasets. Furthermore, results under re-translation, polishing, word deletion, and synonym substitution attacks reveal that it is arduous to remove the watermark without compromising the original semantics.
Abstract（参考訳）: 現在、LLMは様々な分野で人間のようなスキルを示しており、誤用を心配している。したがって、生成されたテキストの検出が不可欠である。しかし, 受動的検出手法は, 領域特異性と限られた対向性に留まっている。テキスト生成時に透かしを埋め込むことが可能なホワイトボックスLCMに対して,透かしベースの手法が提案された。この方法は、モデル語彙をランダムに分割して特殊リストを取得し、確率分布を調整し、リスト内の単語の選択を促進する。リストを認識する検出アルゴリズムは、透かし付きテキストを識別することができる。しかし、この方法はブラックボックス言語モデルのみが利用可能な現実世界のシナリオの多くでは適用できない。例えば、APIベースの垂直アプリケーションを開発するサードパーティは、生成したテキストのみを供給し、商業的利益を保護するために確率分布を保持するため、テキスト自体をウォーターマークすることはできない。サードパーティが生成したテキストに自動的に透かしを注入できるようにするために,ブラックボックス言語モデル利用シナリオのための透かしフレームワークを開発した。具体的には、まず単語に対応するランダムなバイナリエンコーディングを計算するバイナリエンコーディング関数を定義する。非透かしテキストで計算された符号化はベルヌーイ分布に準拠し、ビット-1を表す単語の確率は約0.5である。透かしを注入するために、ビット0を表す単語を、ビット1を表す文脈に基づく同義語に選択的に置き換えることで、分布を変化させる。その後、統計検査によって透かしを識別する。実験により,中国語と英語のデータセットにおける本手法の有効性が実証された。さらに, 再翻訳, 研磨, 単語削除, 同義語置換攻撃による結果から, 本来の意味論を損なうことなく, 透かしを除去することが困難であることが明らかとなった。

論文の概要: Watermarking Text Generated by Black-Box Language Models

関連論文リスト