Fugu-MT 論文翻訳(概要): Self-Verified Distillation: Your Language Model Is Secretly Its Own Synthetic Data Pipeline

論文の概要: Self-Verified Distillation: Your Language Model Is Secretly Its Own Synthetic Data Pipeline

arxiv url: http://arxiv.org/abs/2605.26132v1
Date: Wed, 20 May 2026 17:26:10 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-27 17:51:41.188689
Title: Self-Verified Distillation: Your Language Model Is Secretly Its Own Synthetic Data Pipeline
Title（参考訳）: 自己検証蒸留:あなたの言語モデルは秘密裏に独自の合成データパイプライン
Authors: Tony Lee, Percy Liang,
Abstract要約: 大規模言語モデルのための簡単な訓練後改良アルゴリズムである自己検証蒸留を提案する。自己検証蒸留(Self-Verified Distillation)は、未ラベルの種問に対する候補解を生成する。プロンプトベースの自己検証を使用してフィルタリングし、結果の自己計算データセットをトレーニングする。トレーニングデータ構築中に、より多くの候補世代をサンプリングし、より大きな検証予算を使用することで、高品質な自己計算データが得られることがわかった。
参考スコア（独自算出の注目度）: 56.53954182896384
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Can post-trained large language models (LLMs) further improve themselves using only unlabeled prompts, without external teachers or feedback from tools? We study this setting starting only from unlabeled seed questions with no ground-truth solutions, across three reasoning domains: math, science, and coding. We propose Self-Verified Distillation, a simple post-training refinement algorithm in which the model generates candidate solutions to these seed questions, filters them using prompt-based self-verification, and trains on the resulting self-curated dataset. Inspired by the UQ benchmark's use of multiple validators to screen candidate answers to hard unsolved questions, we adapt this validation-based filtering idea to self-training: the model filters its own generated solutions through a three-stage cascade of cycle-consistency, factuality, and correctness checks, accepting a solution only if it passes all stages with unanimous judge votes. We find that sampling more candidate generations and using a larger verification budget during training data construction produces higher-quality self-curated data and, in turn, better reasoning models. We then train Qwen3 models at multiple scales with Self-Verified Distillation and obtain gains across all three domains. For Qwen3-4B, our method improves aggregate held-out pass@1 by +16.7 points in math (AIME26 and HMMT), +11.1 points in science (GPQA Diamond and HLE), and +8.3 points in coding (LCBv5 and LCBv6), with gains also extending to 0.6B and 8B models. Compared to our test-time-only baseline (UQ-TTC), which improves performance by spending extra compute at inference time, Self-Verified Distillation achieves better performance in most settings while requiring only a single inference call at test time.
Abstract（参考訳）: 学習後の大規模言語モデル(LLM)は、外部教師やツールからのフィードバックなしに、ラベルなしのプロンプトだけでさらに改善できるだろうか? 我々は、この設定を、3つの推論領域(数学、科学、コーディング)にまたがって、基調のない未ラベルの種問のみから研究する。モデルがこれらのシード質問に対する候補解を生成し、プロンプトベースの自己検証を用いてそれらをフィルタリングし、その結果の自己計算データセットをトレーニングする単純な訓練後精錬アルゴリズムである自己検証蒸留を提案する。モデルは、サイクル一貫性、事実性、正当性チェックの3段階のケースを通じて、独自の生成されたソリューションをフィルタリングし、全投票ですべてのステージを通過する場合にのみソリューションを受け入れる。トレーニングデータ構築中に、より多くの候補世代をサンプリングし、より大きな検証予算を使用することで、高品質な自己計算データを生成し、より良い推論モデルが得られることが判明した。次に、自己検証蒸留を用いて複数のスケールでQwen3モデルをトレーニングし、3つのドメインすべてで利得を得る。 Qwen3-4Bでは,数学における集合保持パス@1を+16.7ポイント(AIME26とHMMT)、科学における+11.1ポイント(GPQAダイアモンドとHLE)、コーディングにおける+8.3ポイント(LCBv5とLCBv6)で改善し,0.6Bと8Bモデルにも拡張した。テスト時間のみのベースライン(UQ-TTC)は、推論時間に余分な計算を費やすことでパフォーマンスを向上させるが、セルフ検証蒸留は、テスト時間に1回の推論呼び出ししか必要とせず、ほとんどの設定でより良いパフォーマンスを達成する。

論文の概要: Self-Verified Distillation: Your Language Model Is Secretly Its Own Synthetic Data Pipeline

関連論文リスト