Fugu-MT 論文翻訳(概要): PRO: Enabling Precise and Robust Text Watermark for Open-Source LLMs

論文の概要: PRO: Enabling Precise and Robust Text Watermark for Open-Source LLMs

arxiv url: http://arxiv.org/abs/2510.23891v1
Date: Mon, 27 Oct 2025 22:00:49 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-29 15:35:36.595852
Title: PRO: Enabling Precise and Robust Text Watermark for Open-Source LLMs
Title（参考訳）: PRO: オープンソースLLMのための精度とロバストテキスト透かしの実現
Authors: Jiaqi Xue, Yifei Zhao, Mansour Al Ghanim, Shangqian Gao, Ruimin Sun, Qian Lou, Mengxin Zheng,
Abstract要約: 本稿では,オープンソースのモデルに対する高精度かつロバストなテキスト透かし手法であるPropを提案する。 Proは、モデル修正に対する透かし検出性とレジリエンスの両方を大幅に改善する。
参考スコア（独自算出の注目度）: 33.70483974998233
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text watermarking for large language models (LLMs) enables model owners to verify text origin and protect intellectual property. While watermarking methods for closed-source LLMs are relatively mature, extending them to open-source models remains challenging, as developers cannot control the decoding process. Consequently, owners of open-source LLMs lack practical means to verify whether text was generated by their models. A core difficulty lies in embedding watermarks directly into model weights without hurting detectability. A promising idea is to distill watermarks from a closed-source model into an open one, but this suffers from (i) poor detectability due to mismatch between learned and predefined patterns, and (ii) fragility to downstream modifications such as fine-tuning or model merging. To overcome these limitations, we propose PRO, a Precise and Robust text watermarking method for open-source LLMs. PRO jointly trains a watermark policy model with the LLM, producing patterns that are easier for the model to learn and more consistent with detection criteria. A regularization term further simulates downstream perturbations and penalizes degradation in watermark detectability, ensuring robustness under model edits. Experiments on open-source LLMs (e.g., LLaMA-3.2, LLaMA-3, Phi-2) show that PRO substantially improves both watermark detectability and resilience to model modifications.
Abstract（参考訳）: 大規模言語モデル (LLM) のためのテキスト透かしにより、モデル所有者はテキストの起源を検証でき、知的財産を保護することができる。クローズドソースLLMの透かし手法は比較的成熟しているが、開発者はデコードプロセスを制御できないため、それらをオープンソースモデルに拡張することは難しい。結果として、オープンソースのLLMの所有者は、彼らのモデルからテキストが生成されるかどうかを検証するための実践的な手段を欠いている。主な課題は、検出性を損なうことなく、モデルウェイトに直接透かしを埋め込むことである。有望なアイデアは、クローズドソースモデルからオープンなモデルに透かしを蒸留することだ。 (i)学習パターンと事前定義されたパターンのミスマッチによる検出性の低下 (II)微調整やモデルマージなどの下流修正に対する脆弱性。これらの制約を克服するため,オープンソースのLCMのためのPOW(Precise and Robust text watermarking method)を提案する。 PROはLLMと共同で透かしポリシーモデルをトレーニングし、モデルが学習しやすく、検出基準とより整合したパターンを生成する。正規化項はさらに下流の摂動をシミュレートし、透かし検出性の低下を罰し、モデル編集時の堅牢性を保証する。オープンソースのLCM(例: LLaMA-3.2, LLaMA-3, Phi-2)の実験では、Proは透かしの検出性とモデル修正に対するレジリエンスの両方を大幅に改善することが示された。

論文の概要: PRO: Enabling Precise and Robust Text Watermark for Open-Source LLMs

関連論文リスト