Fugu-MT 論文翻訳(概要): Test-Time Scaling Makes Overtraining Compute-Optimal

論文の概要: Test-Time Scaling Makes Overtraining Compute-Optimal

arxiv url: http://arxiv.org/abs/2604.01411v1
Date: Wed, 01 Apr 2026 21:17:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-03 14:21:10.065933
Title: Test-Time Scaling Makes Overtraining Compute-Optimal
Title（参考訳）: テスト時間スケーリングによって計算最適化のオーバートレーニングが可能に
Authors: Nicholas Roberts, Sungjun Cho, Zhiqi Gao, Tzu-Heng Huang, Albert Wu, Gabriel Orlanski, Avi Trost, Kelly Buchanan, Aws Albarghouthi, Frederic Sala,
Abstract要約: モデルサイズ、トレーニングトークン、推論サンプルの数を共同で最適化するTrain-to-Test(T2$)スケーリング法則を提示します。 T2$は、テストタイムスケーリングに使用されるpass@k$モデリングで事前トレーニングの法則を近代化し、その後、共同で事前トレーニングとテストタイムの決定を最適化する。
参考スコア（独自算出の注目度）: 23.520624926542755
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern LLMs scale at test-time, e.g. via repeated sampling, where inference cost grows with model size and the number of samples. This creates a trade-off that pretraining scaling laws, such as Chinchilla, do not address. We present Train-to-Test ($T^2$) scaling laws that jointly optimize model size, training tokens, and number of inference samples under fixed end-to-end budgets. $T^2$ modernizes pretraining scaling laws with pass@$k$ modeling used for test-time scaling, then jointly optimizes pretraining and test-time decisions. Forecasts from $T^2$ are robust over distinct modeling approaches: measuring joint scaling effect on the task loss and modeling impact on task accuracy. Across eight downstream tasks, we find that when accounting for inference cost, optimal pretraining decisions shift radically into the overtraining regime, well-outside of the range of standard pretraining scaling suites. We validate our results by pretraining heavily overtrained models in the optimal region that $T^2$ scaling forecasts, confirming their substantially stronger performance compared to pretraining scaling alone. Finally, as frontier LLMs are post-trained, we show that our findings survive the post-training stage, making $T^2$ scaling meaningful in modern deployments.
Abstract（参考訳）: 現代のLSMは、例えば繰り返しサンプリングによってテスト時にスケールし、モデルのサイズやサンプルの数に応じて推論コストが増大する。これは、Chinchillaのようなスケーリング法を事前訓練する上で、対処しないトレードオフを生み出します。モデルサイズ、トレーニングトークン、および固定されたエンドツーエンド予算下での推論サンプル数を共同で最適化するTrain-to-Test(T^2$)スケーリング法を提示する。 T^2$は、テストタイムスケーリングに使用されるpass@k$モデリングで事前トレーニングの法則を近代化し、事前トレーニングとテストタイムの決定を共同で最適化する。 T^2$からの予測は、タスク損失に対する共同スケーリング効果の測定とタスク精度に対するモデリングの影響という、異なるモデリングアプローチよりも堅牢である。 8つの下流タスクにわたって、推論コストを考慮すると、最適事前学習決定は、標準事前学習スケーリングスイートの範囲の外で、過度にオーバートレーニング体制に移行する。本研究は,T^2$のスケーリング予測を行う最適領域において,過度にトレーニングされたモデルを事前学習し,事前学習したスケーリング単独と比較して,その性能が著しく高いことを確認した。最後に,フロンティアLSMの訓練後,本研究の成果は訓練後の段階に留まり,現代的な展開においてT^2$のスケーリングが有意義であることを示す。

論文の概要: Test-Time Scaling Makes Overtraining Compute-Optimal

関連論文リスト