Fugu-MT 論文翻訳(概要): Detecting Distillation Data from Reasoning Models

論文の概要: Detecting Distillation Data from Reasoning Models

arxiv url: http://arxiv.org/abs/2510.04850v1
Date: Mon, 06 Oct 2025 14:37:02 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-07 16:52:59.902946
Title: Detecting Distillation Data from Reasoning Models
Title（参考訳）: 共振モデルから蒸留データを検出する
Authors: Hengxiang Zhang, Hyeong Kyu Choi, Yixuan Li, Hongxin Wei,
Abstract要約: 推論蒸留は、大規模言語モデルの推論能力を高めるための効率的で強力なパラダイムとして登場した。しかし, 推理蒸留は, 必然的にベンチマーク汚染を引き起こす可能性があり, 蒸留データセットに含まれる評価データは, 蒸留モデルの性能指標をインフレーションすることができる。本稿では, 生成した出力トークンの確率パターンを利用する, 新規で効果的なToken Probability Deviation(TBD)を提案する。
参考スコア（独自算出の注目度）: 35.042445465049404
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reasoning distillation has emerged as an efficient and powerful paradigm for enhancing the reasoning capabilities of large language models. However, reasoning distillation may inadvertently cause benchmark contamination, where evaluation data included in distillation datasets can inflate performance metrics of distilled models. In this work, we formally define the task of distillation data detection, which is uniquely challenging due to the partial availability of distillation data. Then, we propose a novel and effective method Token Probability Deviation (TBD), which leverages the probability patterns of the generated output tokens. Our method is motivated by the analysis that distilled models tend to generate near-deterministic tokens for seen questions, while producing more low-probability tokens for unseen questions. Our key idea behind TBD is to quantify how far the generated tokens' probabilities deviate from a high reference probability. In effect, our method achieves competitive detection performance by producing lower scores for seen questions than for unseen questions. Extensive experiments demonstrate the effectiveness of our method, achieving an AUC of 0.918 and a TPR@1% FPR of 0.470 on the S1 dataset.
Abstract（参考訳）: 推論蒸留は、大規模言語モデルの推論能力を高めるための効率的で強力なパラダイムとして登場した。しかし, 推理蒸留は, 必然的にベンチマーク汚染を引き起こす可能性があり, 蒸留データセットに含まれる評価データは, 蒸留モデルの性能指標をインフレーションすることができる。本研究では, 蒸留データ検出の課題を正式に定義する。そこで,本研究では,生成した出力トークンの確率パターンを利用する,新規で効果的なToken Probability Deviation(TBD)を提案する。本手法は, 蒸留モデルを用いて, 目に見えない質問に対してほぼ決定論的なトークンを生成する一方で, 目に見えない質問に対してより低確率なトークンを生成する傾向が示唆された。 TBDの背後にある重要な考え方は、生成したトークンの確率が、高い基準確率からどれだけ逸脱するかを定量化することです。本手法は, 未知の質問に対してよりも低いスコアを出力することにより, 競合検出性能を実現する。 S1 データセット上での AUC 0.918 と TPR@1% の FPR 0.470 を達成し,本手法の有効性を実証した。

論文の概要: Detecting Distillation Data from Reasoning Models

関連論文リスト