Fugu-MT 論文翻訳(概要): Blackbox Model Provenance via Palimpsestic Membership Inference

論文の概要: Blackbox Model Provenance via Palimpsestic Membership Inference

arxiv url: http://arxiv.org/abs/2510.19796v1
Date: Wed, 22 Oct 2025 17:30:39 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 03:08:16.234057
Title: Blackbox Model Provenance via Palimpsestic Membership Inference
Title（参考訳）: パームプシス的メンバーシップ推論によるブラックボックスモデルの出現
Authors: Rohith Kuditipudi, Jing Huang, Sally Zhu, Diyi Yang, Christopher Potts, Percy Liang,
Abstract要約: Aliceはオープンウェイトな言語モデルをトレーニングし、BobはAliceのモデルのブラックボックスデリバティブを使ってテキストを生成します。 Alice氏は、Bob氏のデリバティブモデルをクエリするか、テキストのみからクエリすることで、Bob氏がモデルを使っていることを証明できますか? 我々は、Bobのモデルとテキストの相関と、Aliceのトレーニングランにおけるトレーニング例の順序を捉えるテスト統計を使用する。
参考スコア（独自算出の注目度）: 96.73342141272549
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Suppose Alice trains an open-weight language model and Bob uses a blackbox derivative of Alice's model to produce text. Can Alice prove that Bob is using her model, either by querying Bob's derivative model (query setting) or from the text alone (observational setting)? We formulate this question as an independence testing problem--in which the null hypothesis is that Bob's model or text is independent of Alice's randomized training run--and investigate it through the lens of palimpsestic memorization in language models: models are more likely to memorize data seen later in training, so we can test whether Bob is using Alice's model using test statistics that capture correlation between Bob's model or text and the ordering of training examples in Alice's training run. If Alice has randomly shuffled her training data, then any significant correlation amounts to exactly quantifiable statistical evidence against the null hypothesis, regardless of the composition of Alice's training data. In the query setting, we directly estimate (via prompting) the likelihood Bob's model gives to Alice's training examples and order; we correlate the likelihoods of over 40 fine-tunes of various Pythia and OLMo base models ranging from 1B to 12B parameters with the base model's training data order, achieving a p-value on the order of at most 1e-8 in all but six cases. In the observational setting, we try two approaches based on estimating 1) the likelihood of Bob's text overlapping with spans of Alice's training examples and 2) the likelihood of Bob's text with respect to different versions of Alice's model we obtain by repeating the last phase (e.g., 1%) of her training run on reshuffled data. The second approach can reliably distinguish Bob's text from as little as a few hundred tokens; the first does not involve any retraining but requires many more tokens (several hundred thousand) to achieve high power.
Abstract（参考訳）: アリスはオープンウェイト言語モデルを訓練し、ボブはアリスのモデルのブラックボックスデリバティブを使ってテキストを生成する。 Alice氏は、Bob氏のデリバティブモデル(クエリ設定)をクエリするか、あるいはテキストのみ(観測設定)からクエリすることで、Bob氏がモデルを使用していることを証明できますか? 我々は、この疑問を独立性テスト問題として定式化する: null仮説では、Bobのモデルやテキストが、Aliceのランダム化トレーニングとは独立している、と仮定する。そして、言語モデルにおける悲観的記憶のレンズを通して、それを調査する:モデルは、訓練中に後で見られるデータを記憶しやすく、BobがAliceのモデルを使っているかどうかを、Bobのモデルやテキスト間の相関を捉えたテスト統計と、Aliceのトレーニング実行中のトレーニング例の順序を検査することができる。アリスがランダムにトレーニングデータをシャッフルした場合、いかなる有意な相関も、アリスのトレーニングデータの構成にかかわらず、ヌル仮説に対する正確な統計的証拠となる。クエリ設定では、BobのモデルがAliceのトレーニング例と順序に与える可能性を直接推定し、Pythia と OLMo のベースモデルから 1B から 12B のパラメータまで、40 以上のファインチューンをベースモデルのトレーニングデータ順序と相関付け、少なくとも 1e-8 の順序で p-値を達成する。観測環境では、推定に基づく2つのアプローチを試す 1) アリスの訓練例とボブの文章が重複する可能性 2) Aliceのモデルの異なるバージョンに対するBobのテキストの可能性は,トレーニングの最終フェーズ(例:1%)をリシャッフルしたデータで繰り返すことで得られる。第2のアプローチでは、ボブのテキストと数百のトークンを確実に区別することができる。

論文の概要: Blackbox Model Provenance via Palimpsestic Membership Inference

関連論文リスト