Fugu-MT 論文翻訳(概要): Dataset Watermarking for Closed LLMs with Provable Detection

論文の概要: Dataset Watermarking for Closed LLMs with Provable Detection

arxiv url: http://arxiv.org/abs/2605.06865v1
Date: Thu, 07 May 2026 19:06:35 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-11 19:43:38.568357
Title: Dataset Watermarking for Closed LLMs with Provable Detection
Title（参考訳）: 確率検出による閉LLMのデータセット透かし
Authors: Pengrun Huang, Kamalika Chaudhuri, Yu-Xiang Wang,
Abstract要約: 大規模言語モデル(LLM)は、大量のゆるいキュレートされたデータに基づいて事前訓練および後訓練される。これはデータセットのウォーターマーキングの必要性を動機付けている。データセットのトレーニングが結果モデルに検出可能なシグネチャを残すようなデータセットを設計する。実証可能な検出が可能な閉LLMのための最初のデータセット透かし手法を提案する。
参考スコア（独自算出の注目度）: 45.743499931376704
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) are pre-trained and post-trained on vast amounts of loosely curated data, raising the possibility that these models may have been trained on proprietary datasets or the same benchmarks used for evaluation. This motivates the need for dataset watermarking: designing datasets such that training on them leaves detectable signatures in the resulting model. Prior work has explored this problem for open models. We introduce the first dataset watermarking method for closed LLMs with provable detection. In particular, we embed a dataset-level watermark signal by increasing the co-occurrence frequency of randomly selected word pairs through rephrasing, and detect it using a statistical test on co-occurrence patterns in model-generated outputs. We evaluate our method with multiple base models and benchmark datasets and show that it reliably detects the watermark ($p <0.01$) in the fine-tuning stage. Notably, our method remains effective in a data mixture setting where the watermarked dataset constitutes only approximately $1\%$ of the total fine-tuning tokens. Furthermore, we show that our method preserves the utility and semantic integrity of the benchmark.
Abstract（参考訳）: 大規模言語モデル(LLM)は、大量のゆるいキュレートされたデータに基づいて事前訓練され、後から訓練されるため、これらのモデルがプロプライエタリなデータセットや評価に使用される同じベンチマークでトレーニングされた可能性がある。これはデータセットのウォーターマーキングの必要性を動機付けている。データセットのトレーニングが結果モデルに検出可能なシグネチャを残すようなデータセットを設計する。これまでの研究は、オープンモデルのためにこの問題を探求してきた。実証可能な検出が可能な閉LLMのための最初のデータセット透かし手法を提案する。特に,ランダムに選択された単語ペアの共起頻度を高めてデータセットレベルの透かし信号を埋め込み,モデル生成出力の共起パターンに関する統計的テストを用いて検出する。提案手法は,複数のベースモデルとベンチマークデータセットを用いて評価し,微調整段階における透かし(p <0.01$)を確実に検出できることを示す。特に,透かし付きデータセットが全微調整トークンの約1\%の値のみを構成するデータ混合環境では,本手法は依然として有効である。さらに,本手法は,ベンチマークの有用性とセマンティックな整合性を保っていることを示す。

論文の概要: Dataset Watermarking for Closed LLMs with Provable Detection

関連論文リスト