Fugu-MT 論文翻訳(概要): Mid-Training of Large Language Models: A Survey

論文の概要: Mid-Training of Large Language Models: A Survey

arxiv url: http://arxiv.org/abs/2510.06826v1
Date: Wed, 08 Oct 2025 09:49:37 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-09 16:41:20.401415
Title: Mid-Training of Large Language Models: A Survey
Title（参考訳）: 大規模言語モデルのミッドトレーニング:サーベイ
Authors: Kaixiang Mo, Yuxin Shi, Weiwei Weng, Zhiqiang Zhou, Shuman Liu, Haibo Zhang, Anxiang Zeng,
Abstract要約: 大規模言語モデル(LLM)は通常、大規模事前学習とタスク固有の微調整によって開発される。近年の進歩は中間訓練段階の重要性を浮き彫りにしている。トレーニング中のデータ分散,学習速度スケジューリング,長文拡張の最初の分類について紹介する。
参考スコア（独自算出の注目度）: 12.322464058364405
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) are typically developed through large-scale pre-training followed by task-specific fine-tuning. Recent advances highlight the importance of an intermediate mid-training stage, where models undergo multiple annealing-style phases that refine data quality, adapt optimization schedules, and extend context length. This stage mitigates diminishing returns from noisy tokens, stabilizes convergence, and expands model capability in late training. Its effectiveness can be explained through gradient noise scale, the information bottleneck, and curriculum learning, which together promote generalization and abstraction. Despite widespread use in state-of-the-art systems, there has been no prior survey of mid-training as a unified paradigm. We introduce the first taxonomy of LLM mid-training spanning data distribution, learning-rate scheduling, and long-context extension. We distill practical insights, compile evaluation benchmarks, and report gains to enable structured comparisons across models. We also identify open challenges and propose avenues for future research and practice.
Abstract（参考訳）: 大規模言語モデル(LLM)は通常、大規模事前学習とタスク固有の微調整によって開発される。近年の進歩は、データ品質を洗練し、最適化スケジュールを適応し、コンテキスト長を延長する複数のアニール型フェーズをモデルが実施する中間訓練ステージの重要性を強調している。この段階はノイズトークンからの減少するリターンを緩和し、収束を安定化し、後期訓練におけるモデル能力を拡張する。その効果は、勾配雑音尺度、情報ボトルネック、カリキュラム学習を通じて説明できる。最先端のシステムで広く使われているにもかかわらず、統一パラダイムとしての中間訓練に関する以前の調査は行われていない。本稿では,LLMの中間訓練データ分布,学習速度スケジューリング,長文拡張について紹介する。実用的洞察を抽出し、評価ベンチマークをコンパイルし、レポートゲインを計算し、モデル間で構造化された比較を可能にする。また,オープンな課題を特定し,今後の研究・実践の道筋を提案する。

論文の概要: Mid-Training of Large Language Models: A Survey

関連論文リスト