Fugu-MT 論文翻訳(概要): From Acceleration to Saturation: Scaling Behavior of Bootstrapped Language Model Pretraining

論文の概要: From Acceleration to Saturation: Scaling Behavior of Bootstrapped Language Model Pretraining

arxiv url: http://arxiv.org/abs/2510.06548v1
Date: Wed, 08 Oct 2025 00:59:33 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-09 16:41:20.248856
Title: From Acceleration to Saturation: Scaling Behavior of Bootstrapped Language Model Pretraining
Title（参考訳）: 加速から飽和へ:ブートストラップ言語モデルの事前学習のスケーリング行動
Authors: Seng Pei Liew, Takuya Kato,
Abstract要約: ブートストラッププレトレーニングのスケーリング挙動について検討し,そのスケーリング効率が予測可能な方法で低下することを確認した。本研究は,効率的な言語モデル学習のための実践的知見を提供し,過度に訓練されたモデルの再利用に関する重要な考察を提起する。
参考スコア（独自算出の注目度）: 2.569647910019739
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Bootstrapped pretraining, i.e., the reuse of a pretrained base model for further pretraining, such as continual pretraining or model growth, is promising at reducing the cost of training language models from scratch. However, its effectiveness remains unclear, especially when applied to overtrained base models. In this work, we empirically study the scaling behavior of bootstrapped pretraining and find that its scaling efficiency diminishes in a predictable manner: The scaling exponent with respect to second-stage pretraining tokens decreases logarithmically with the number of tokens used to pretrain the base model. The joint dependence on first- and second-stage tokens is accurately modeled by a simple scaling law. Such saturation effect reveals a fundamental trade-off in multi-stage pretraining strategies: the more extensively a model is pretrained, the less additional benefit bootstrapping provides. Our findings provide practical insights for efficient language model training and raise important considerations for the reuse of overtrained models.
Abstract（参考訳）: ブートストラッププレトレーニング(Bootstrapped Pretraining)、すなわち、継続事前トレーニング(Continuous Pretraining)やモデル成長(Model Growth)などの事前トレーニングのための事前トレーニングベースモデルの再利用は、言語モデルのスクラッチからコストを削減することを約束している。しかし、特に過度に訓練されたベースモデルに適用された場合、その有効性は不明確である。本研究では,ブートストラップ付き事前学習のスケーリング挙動を実証的に研究し,そのスケーリング効率が予測可能な方法で低下することを発見した。第一段および第二段のトークンに対する共同依存は、単純なスケーリング法則によって正確にモデル化される。このような飽和効果は、多段階事前訓練戦略における基本的なトレードオフを明らかにしている。本研究は,効率的な言語モデル学習のための実践的知見を提供し,過度に訓練されたモデルの再利用に関する重要な考察を提起する。

論文の概要: From Acceleration to Saturation: Scaling Behavior of Bootstrapped Language Model Pretraining

関連論文リスト