Fugu-MT 論文翻訳(概要): Data Provenance Auditing of Fine-Tuned Large Language Models with a Text-Preserving Technique

論文の概要: Data Provenance Auditing of Fine-Tuned Large Language Models with a Text-Preserving Technique

arxiv url: http://arxiv.org/abs/2510.09655v1
Date: Tue, 07 Oct 2025 08:34:08 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-14 18:06:29.523237
Title: Data Provenance Auditing of Fine-Tuned Large Language Models with a Text-Preserving Technique
Title（参考訳）: テキスト保存技術を用いた微調整大言語モデルのデータ公開監査
Authors: Yanming Li, Seifeddine Ghozzi, Cédric Eichler, Nicolas Anciaux, Alexandra Bensamoun, Lorena Gonzalez Manzano,
Abstract要約: 我々は、見えないUnicode文字を文書に埋め込むテキスト保存型透かしフレームワークを導入する。我々は50の文書を微調整した後の応答を検出する際に0.1%未満の失敗率を実験的に観察した。 100%TPR@0% FPRに対応する18,000以上の課題において、突発的な回答は得られなかった。
参考スコア（独自算出の注目度）: 36.96848724920411
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We address the problem of auditing whether sensitive or copyrighted texts were used to fine-tune large language models (LLMs) under black-box access. Prior signals-verbatim regurgitation and membership inference-are unreliable at the level of individual documents or require altering the visible text. We introduce a text-preserving watermarking framework that embeds sequences of invisible Unicode characters into documents. Each watermark is split into a cue (embedded in odd chunks) and a reply (embedded in even chunks). At audit time, we submit prompts that contain only the cue; the presence of the corresponding reply in the model's output provides evidence of memorization consistent with training on the marked text. To obtain sound decisions, we compare the score of the published watermark against a held-out set of counterfactual watermarks and apply a ranking test with a provable false-positive-rate bound. The design is (i) minimally invasive (no visible text changes), (ii) scalable to many users and documents via a large watermark space and multi-watermark attribution, and (iii) robust to common passive transformations. We evaluate on open-weight LLMs and multiple text domains, analyzing regurgitation dynamics, sensitivity to training set size, and interference under multiple concurrent watermarks. Our results demonstrate reliable post-hoc provenance signals with bounded FPR under black-box access. We experimentally observe a failure rate of less than 0.1\% when detecting a reply after fine-tuning with 50 marked documents. Conversely, no spurious reply was recovered in over 18,000 challenges, corresponding to a 100\%TPR@0\% FPR. Moreover, detection rates remain relatively stable as the dataset size increases, maintaining a per-document detection rate above 45\% even when the marked collection accounts for less than 0.33\% of the fine-tuning data.
Abstract（参考訳）: 我々は,ブラックボックスアクセス下での大規模言語モデル(LLM)の微調整にセンシティブテキストや著作権テキストが使用されたかどうかを監査する問題に対処する。事前の信号バーバイットの復活とメンバーシップの推測は個々の文書のレベルで信頼できないか、あるいは可視テキストを変更する必要がある。本稿では,見えないUnicode文字のシーケンスを文書に埋め込むテキスト保存型透かしフレームワークを提案する。それぞれの透かしはキュー(奇数チャンクに埋め込み)と返信(偶数チャンクに埋め込む)に分けられる。評価時には、キューのみを含むプロンプトを送信し、モデル出力に対応する応答の存在は、マークされたテキストのトレーニングと整合した記憶の証拠を提供する。音の判定を行うため,提案した透かしのスコアを,偽陰性透かしのホールトアウトセットと比較し,証明可能な偽陽性レート境界を用いたランキングテストを適用した。デザインは (i)最小侵襲(可視的テキスト変更なし) (二)大きな透かし空間とマルチ透かし属性を介して、多くのユーザや文書に拡張性を持たせること、 (三)一般的な受動変換に頑健である。我々は,オープンウェイトLLMと複数テキスト領域について評価し,リグルテーションのダイナミクス,トレーニングセットサイズに対する感度,複数同時透かしによる干渉を解析した。以上の結果より,ブラックボックスアクセス下でのFPRが有効であることを示す。また,50の文書を微調整した後の応答を検出すると,0.1\%未満の故障率を実験的に観察した。逆に、100\%TPR@0\% FPRに対応する18,000以上の課題において、突発的な回答は得られなかった。さらに、データセットのサイズが大きくなるにつれて検出率は比較的安定し、記録されたコレクションが微調整データの0.33\%未満を占める場合でも、文書ごとの検出率を45\%以上維持する。

論文の概要: Data Provenance Auditing of Fine-Tuned Large Language Models with a Text-Preserving Technique

関連論文リスト