Fugu-MT 論文翻訳(概要): Spiking the training data to correct for test set contamination

論文の概要: Spiking the training data to correct for test set contamination

arxiv url: http://arxiv.org/abs/2605.24818v1
Date: Sun, 24 May 2026 02:06:59 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:18.464065
Title: Spiking the training data to correct for test set contamination
Title（参考訳）: トレーニングデータをスパイクしてテストセットの汚染を補正する
Authors: Johnny Tian-Zheng Wei, Jerry Li, Ameya Godbole, Robin Jia,
Abstract要約: そこで本研究では,テスト例を意識的に既知の速度で汚染することにより,トレーニングデータをスパイクする手法を提案する。スパイクされた例は、インフレーションされたテストスコアの統計的補正を可能にするモデル記憶の予測器の校正に使用することができる。
参考スコア（独自算出の注目度）: 28.940486760749025
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The literature on test set contamination largely focuses on detection, but the correction of contaminated test scores is underexplored. Our core proposal is to spike the training data by intentionally contaminating some test examples at known rates. The spiked examples can then be used to calibrate predictors of model memorization which enable principled statistical correction of inflated test scores. To evaluate different correction estimators, we first present a simulation framework based on the Hubble models. Hubble models come in minimal pairs, where the perturbed model was deliberately contaminated with several test sets, while the standard model was not, serving as the counterfactual and correction target. We consider estimators that use information from a memorization predictor, correctness predictor, or both. In simulation, we establish basic statistical intuitions and show that estimators leveraging memorization and correctness information are better than naive estimation which makes no correction at all. We then instantiate several memorization and correctness predictors, and find that simple predictors such as Platt-scaled membership inference metrics provide good signal for correction. Finally, we examine the practical considerations of spiking. Simple memorization predictors need no more than 10 examples for calibration and often transfer from one dataset to another. Taken together, spiking is a promising solution for test set contamination.
Abstract（参考訳）: テストセット汚染に関する文献は、主に検出に焦点を当てているが、汚染されたテストスコアの補正は過小評価されている。私たちの中核的な提案は、テスト例のいくつかを既知の速度で意図的に汚染することで、トレーニングデータをスパイクすることです。スパイクされた例は、インフレーションされたテストスコアの統計的補正を可能にするモデル記憶の予測器の校正に使用することができる。異なる補正推定器を評価するために,まずハッブルモデルに基づくシミュレーションフレームワークを提案する。ハッブルモデルには最小のペアがあり、摂動モデルはいくつかのテストセットで意図的に汚染され、標準モデルは反ファクトと修正ターゲットとして機能しなかった。我々は,暗記予測器,正当性予測器,あるいはその両方から情報を利用する推定器を検討する。シミュレーションでは, 基本的な統計的直観を確立し, 暗記情報と正当性情報を利用した推定器は, 補正を全く行わないネーブ推定よりも優れていることを示す。次に,いくつかの暗記と正当性予測器をインスタンス化し,Plattスケールの会員推定指標のような単純な予測器が補正によい信号を提供することを示した。最後に,スパイキングの実践的考察について検討する。単純な暗記予測器は、キャリブレーションの例を10つ以上必要とせず、しばしばあるデータセットから別のデータセットへ転送する。まとめると、スパイクはテストセットの汚染に対する有望な解決策である。

論文の概要: Spiking the training data to correct for test set contamination

関連論文リスト