Fugu-MT 論文翻訳(概要): Detecting Post-generation Edits to Watermarked LLM Outputs via Combinatorial Watermarking

論文の概要: Detecting Post-generation Edits to Watermarked LLM Outputs via Combinatorial Watermarking

arxiv url: http://arxiv.org/abs/2510.01637v1
Date: Thu, 02 Oct 2025 03:33:12 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-03 16:59:20.971343
Title: Detecting Post-generation Edits to Watermarked LLM Outputs via Combinatorial Watermarking
Title（参考訳）: 組合せ透かしによる透かしLLM出力のポストジェネレーション編集の検出
Authors: Liyan Xie, Muhammad Siddeek, Mohamed Seif, Andrea J. Goldsmith, Mengdi Wang,
Abstract要約: ウォーターマークされたLCM出力に局所的に編集されたポストジェネレーションの編集を検出する。本稿では,語彙を部分集合に分割し,透かしを埋め込むパターンベースの透かしフレームワークを提案する。本手法は,様々な編集シナリオにおけるオープンソースのLCMについて評価し,編集ローカライゼーションにおける強力な経験的性能を示す。
参考スコア（独自算出の注目度）: 51.417096446156926
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Watermarking has become a key technique for proprietary language models, enabling the distinction between AI-generated and human-written text. However, in many real-world scenarios, LLM-generated content may undergo post-generation edits, such as human revisions or even spoofing attacks, making it critical to detect and localize such modifications. In this work, we introduce a new task: detecting post-generation edits locally made to watermarked LLM outputs. To this end, we propose a combinatorial pattern-based watermarking framework, which partitions the vocabulary into disjoint subsets and embeds the watermark by enforcing a deterministic combinatorial pattern over these subsets during generation. We accompany the combinatorial watermark with a global statistic that can be used to detect the watermark. Furthermore, we design lightweight local statistics to flag and localize potential edits. We introduce two task-specific evaluation metrics, Type-I error rate and detection accuracy, and evaluate our method on open-source LLMs across a variety of editing scenarios, demonstrating strong empirical performance in edit localization.
Abstract（参考訳）: ウォーターマーキングはプロプライエタリな言語モデルの鍵となる技術となり、AI生成と人文テキストの区別を可能にしている。しかし、多くの現実のシナリオでは、LLM生成したコンテンツは、人間の修正や偽装攻撃のようなポストジェネレーションの編集を受ける可能性があるため、そのような修正を検知し、ローカライズすることが重要となる。そこで本研究では,LLM出力に対して局所的に編集された後編集を検知するタスクを提案する。そこで本研究では,語彙を非結合部分集合に分割し,これらの部分集合に対して決定論的組合せパターンを強制することによって透かしを埋め込む,組合せパターンに基づく透かしフレームワークを提案する。組み合わせ型透かしにはグローバルな統計が伴い、この透かしを検出できる。さらに、潜在的な編集のフラグとローカライズのために、軽量なローカル統計を設計する。タスク固有の評価指標として,Type-Iエラー率と検出精度の2つを導入し,様々な編集シナリオにまたがるオープンソースのLCMの評価を行い,編集ローカライゼーションにおける強力な経験的性能を示す。

論文の概要: Detecting Post-generation Edits to Watermarked LLM Outputs via Combinatorial Watermarking

関連論文リスト