Fugu-MT 論文翻訳(概要): PRISM: Demystifying Retention and Interaction in Mid-Training

論文の概要: PRISM: Demystifying Retention and Interaction in Mid-Training

arxiv url: http://arxiv.org/abs/2603.17074v1
Date: Tue, 17 Mar 2026 19:04:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-19 18:32:57.36509
Title: PRISM: Demystifying Retention and Interaction in Mid-Training
Title（参考訳）: PRISM: トレーニング中頃の維持と相互作用の最小化
Authors: Bharat Runwal, Ashish Agrawal, Anurag Roy, Rameswar Panda,
Abstract要約: PRISMは、大規模言語モデルにおける中級学習設計の選択に関する総合的な実証的研究である。約27Bの高品位トークンの中間トレーニングでは, 数学では+15から+40点, コードでは+5から+12点, 科学ベンチマークでは+6から+13点, 一般性能は+6から+13点となる。
参考スコア（独自算出の注目度）: 20.198164159173647
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present PRISM, a comprehensive empirical study of mid-training design choices for large language models. Through controlled experiments across seven base models spanning four families (Granite, LLaMA, Mistral, Nemotron-H), two architecture types (dense Transformer and attention-Mamba hybrid), and scales from 3B to 24B parameters, we show that mid-training on approximately 27B high-quality tokens yields consistent gains of +15 to +40 points on math, +5 to +12 points on code, and +6 to +13 points on science benchmarks while preserving general performance. The full PRISM to RL pipeline improves macro-average across six reasoning benchmarks from under 12 to 29-42 (a 3-4x improvement), whereas RL applied directly to most of the base models remains substantially less effective, with AIME scores near zero. Data composition matters most at mid-training, not RL: including science data during mid-training unlocks +17 to +28 point GPQA-Diamond gains during RL, while changing the RL mix produces less than 2 point differences. Mechanistically, mid-training densely restructures over 90% of model weights, while RL makes sparse, front-loaded refinements to approximately 5% of parameters. Representation analysis (CKA) confirms that RL consistently preserves mid-training's representational geometry (over 0.998 CKA) across architectures. Crucially, RL applies identical weight changes regardless of starting point, yet only succeeds on mid-trained models, consistent with mid-training placing the model in a configuration from which RL can effectively improve performance. Our results demonstrate that retention-aware mid-training is highly effective for reliable reasoning enhancement and provide practical guidance for designing robust mid-training pipelines.
Abstract（参考訳）: 本稿では,大規模言語モデルにおける中級学習設計選択に関する総合的研究であるPRISMについて述べる。 4つのファミリ(Granite, LLaMA, Mistral, Nemotron-H),2種類のアーキテクチャタイプ(Dense Transformer, attention-Mamba hybrid),および3Bから24Bパラメータのスケールによる制御実験により,約27Bの高品質トークン上でのトレーニングは,一般的な性能を維持しながら,プログラム上の+15から+40ポイント,コード上の+5から+12ポイント,科学ベンチマーク上の+6から+13ポイントの一貫性のあるゲインが得られることを示した。完全なPRISMからRLパイプラインは6つの推論ベンチマークのマクロ平均を12から29から42に改善する(3～4倍の改善)。データ構成は、RLではなく、トレーニング中の科学データを含め、RLの間は+17から+28ポイントのGPQA-ダイアモンドが上昇する一方、RLミックスの変更は2ポイント未満の違いをもたらす。機械的に、中級訓練はモデル重量の90%以上を過密に再構成する一方、RLは粗く、前装の精製を約5%に調整する。 Representation Analysis (CKA) は、RLがアーキテクチャ全体にわたって中級トレーニングの表現幾何学(0.998 CKA以上)を一貫して保持していることを確認した。重要なことは、RLは開始点に関係なく同じ重量変化を施すが、RLが性能を効果的に向上できる構成にモデルを配置する中級訓練モデルでのみ成功する。本研究は,保持型中等訓練が信頼性の高い推論向上に有効であることを示し,堅牢な中等訓練パイプラインの設計のための実践的ガイダンスを提供する。

論文の概要: PRISM: Demystifying Retention and Interaction in Mid-Training

関連論文リスト