Fugu-MT 論文翻訳(概要): Pretraining Scaling Laws for Generative Evaluations of Language Models

論文の概要: Pretraining Scaling Laws for Generative Evaluations of Language Models

arxiv url: http://arxiv.org/abs/2509.24012v1
Date: Sun, 28 Sep 2025 18:04:18 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:19.591222
Title: Pretraining Scaling Laws for Generative Evaluations of Language Models
Title（参考訳）: 言語モデルの生成的評価のためのスケーリング法則の事前学習
Authors: Rylan Schaeffer, Noam Levi, Brando Miranda, Sanmi Koyejo,
Abstract要約: 生成評価にパス-at-k$を適合させ、最も高価なモデルのパス-at-k$を予測するための3つの異なるスケーリング法則を示す。我々のフレームワークは、研究者や実践者に対して、生成性能を予測するための洞察と方法論を提供します。
参考スコア（独自算出の注目度）: 30.6654523997984
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Neural scaling laws have played a central role in modern machine learning, driving the field's ever-expanding scaling of parameters, data and compute. While much research has gone into fitting scaling laws and predicting performance on pretraining losses and on discriminative evaluations such as multiple-choice question-answering, comparatively little research has been done on fitting scaling laws and predicting performance on generative evaluations such as mathematical problem-solving or software engineering. We propose and evaluate three different pretraining scaling laws for fitting pass-at-$k$ on generative evaluations and for predicting pass-at-$k$ of the most expensive model using the performance of cheaper models. Our three scaling laws differ in the covariates used: (1) compute, (2) model parameters and tokens, (3) log likelihoods of gold reference solutions. We make four main contributions: (1) We show how generative evaluations offer new hyperparameters (in our setting, $k$) that researchers can use to control the scaling laws parameters and the predictability of performance. (2) In terms of scaling law parameters, we find that the compute scaling law and parameters\,+\,tokens scaling law stabilize for the last ~$1.5{-}2.5$ orders of magnitude, whereas the gold reference likelihood scaling law stabilizes for the last ~$5$ orders of magnitude. (3) In terms of predictive performance, we find all three scaling laws perform comparably, although the compute scaling law predicts slightly worse for small $k$ and the log likelihoods of gold reference solutions predicts slightly worse for large $k$. (4) We establish a theoretical connection that the compute scaling law emerges as the compute-optimal envelope of the parameters-and-tokens scaling law. Our framework provides researchers and practitioners with insights and methodologies to forecast generative performance.
Abstract（参考訳）: ニューラルネットワークのスケーリング法則は、現代の機械学習において中心的な役割を担い、パラメータやデータ、計算の拡大を続ける分野のスケーリングを推進してきた。スケーリング法則の適合や事前学習損失の予測、複数選択質問回答などの差別的評価など、多くの研究が行われているが、スケーリング法則の適合や数学的問題解決やソフトウェア工学のような生成的評価のパフォーマンスの予測については、比較的研究がほとんど行われていない。本稿では,3種類の事前学習スケーリング法則を提案し,より安価なモデルの性能を用いて,パス・アット・ドルを生成的評価に適用し,パス・アット・ドル・ドルを最も高価なモデルの予測を行う。この3つのスケーリング法則は,(1)計算,(2)モデルパラメータとトークン,(3)ゴールド基準解のログ可能性の3つの共変量で異なる。 1) 生成的評価が新しいハイパーパラメータ(この設定では$k$)を提供することで、研究者がスケーリング法則パラメータと性能の予測可能性を制御することができることを示す。 2) 法則のスケーリングでは, 計算スケーリング法則とパラメータ\,+\,tokensスケーリング法則が最終1.5{-}2.5$のオーダーで安定化するのに対し, ゴールド基準のスケーリング法則は最終1.5$のオーダーで安定化する。 (3) 予測性能の面では,3つのスケーリング法則が相容れないが,計算スケーリング法則は小さい$k$ではわずかに悪いと予測し,大きな$k$では,ゴールドレファレンスソリューションのログ確率はわずかに悪いと予測する。 (4) パラメータ・アンド・トークンスケーリング法則の計算最適エンベロープとして計算スケーリング法則が出現する理論接続を確立する。我々のフレームワークは、研究者や実践者に対して、生成性能を予測するための洞察と方法論を提供します。

論文の概要: Pretraining Scaling Laws for Generative Evaluations of Language Models

関連論文リスト