Fugu-MT 論文翻訳(概要): Evaluating the Robustness of Chinchilla Compute-Optimal Scaling

論文の概要: Evaluating the Robustness of Chinchilla Compute-Optimal Scaling

arxiv url: http://arxiv.org/abs/2509.23963v1
Date: Sun, 28 Sep 2025 16:41:01 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:19.557857
Title: Evaluating the Robustness of Chinchilla Compute-Optimal Scaling
Title（参考訳）: チンチラCompute-Optimal Scalingのロバスト性評価
Authors: Rylan Schaeffer, Noam Levi, Andreas Kirsch, Theo Guenais, Brando Miranda, Elyas Obbad, Sanmi Koyejo,
Abstract要約: Hoffman et al (2022)のChinchilla論文は、計算最適スケーリングの原則を導入した。開業医はいまだにチンチラの処方薬を頼りにできるのか? 意外なことに、分析にどのモデルパラメータが使われているかは、重要な結果に有意に影響を与えていない。
参考スコア（独自算出の注目度）: 27.80623613251178
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hoffman et al (2022)'s Chinchilla paper introduced the principle of compute-optimal scaling, laying a foundation for future scaling of language models. In the years since, however, valid concerns about Chinchilla have been raised: wide confidence intervals, discrepancies between its three approaches, and incongruities with other scaling laws. This raises a critical question for the field: Can practitioners still rely on Chinchilla's prescriptions? Our work demonstrates the answer is yes. We begin by uncovering that the model parameters central to Chinchilla's analyses were ambiguous: three interpretations are possible, with relative differences between different interpretations of model parameters as high as 15.2%. We find that, perhaps surprisingly, which model parameters are used for the analyses do not meaningfully affect key results: the scaling law estimates and the compute-optimal tokens-to-parameter ratio. Indeed, under one interpretation, the tokens-to-parameter ratio becomes more constant with the target compute budget. We then ask how distorted the Chinchilla model parameters could have been without meaningfully affecting the key results. By deliberately perturbing model parameters in four structured ways, we find that key Chinchilla results are most sensitive to additive or systematic errors, which can alter the otherwise flat trend of the optimal tokens-to-parameter ratio, but overall, Chinchilla's key results withstand sizable perturbations. Altogether, our findings offer the field renewed confidence in Chinchilla as a durable guide for scaling language models.
Abstract（参考訳）: Hoffman et al (2022) の Chinchilla 論文は計算最適スケーリングの原則を導入し、将来の言語モデルのスケーリング基盤を構築した。しかし、それ以来、チンチラに関する有効な懸念が高まっている: 広範囲の信頼区間、その3つのアプローチの相違、および他のスケーリング法と矛盾する点である。これは、この分野にとって重要な疑問を提起する: 実践者は依然として、チチラの処方薬を頼りにできるだろうか? 私たちの研究は答えがイエスであることを証明している。 3つの解釈が可能であり、モデルパラメータの異なる解釈と最大15.2%の相対的な差異がある。解析にどのモデルパラメータが使用されるかは、スケーリング法則の推定値や計算-最適トークン-パラメータ比など、重要な結果に有意に影響を与えない。実際、1つの解釈の下では、トークンとパラメータの比率は、目標の計算予算よりも一定になる。次に、キーとなる結果に意味のある影響を与えずに、どうやってChinchillaモデルのパラメータが歪んだのかを尋ねます。モデルパラメータを4つの構造化された方法で意図的に摂動することにより、キーチンチラの結果は、最適トークンとパラメータ比の非平坦な傾向を変化させることのできる、追加的または体系的なエラーに対して最も敏感であることが分かるが、全体としては、チチラの重要な結果は、大きな摂動に耐える。さらに、我々の発見は、言語モデルを拡張するための耐久性のあるガイドとして、Chinchillaを再び信頼する場を提供する。

論文の概要: Evaluating the Robustness of Chinchilla Compute-Optimal Scaling

関連論文リスト