Fugu-MT 論文翻訳(概要): Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization

論文の概要: Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization

arxiv url: http://arxiv.org/abs/2603.08022v1
Date: Mon, 09 Mar 2026 06:58:00 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:15.613223
Title: Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization
Title（参考訳）: LLMデータの効率的な最適化を可能にするキャパシティ・アウェア混合法
Authors: Jingwei Li, Xinran Gu, Jingzhao Zhang,
Abstract要約: データ混合スケーリングのための計算効率の高いパイプラインを導入する。まず、検証損失をモデル化するキャパシティ対応混合法則CAMELを提案する。また,検証損失からベンチマーク精度を推定する損益予測法を導入する。
参考スコア（独自算出の注目度）: 20.220685778194156
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A data mixture refers to how different data sources are combined to train large language models, and selecting an effective mixture is crucial for optimal downstream performance. Existing methods either conduct costly searches directly on the target model or rely on mixture scaling laws that fail to extrapolate well to large model sizes. We address these limitations by introducing a compute-efficient pipeline for data mixture scaling. First, we propose CAMEL, a capacity-aware mixture law that models validation loss with the nonlinear interplay between model size and mixture. We also introduce a loss-to-benchmark prediction law that estimates benchmark accuracy from validation loss, enabling end-to-end performance prediction for the target model. Next, we study how to allocate a fixed compute budget across model scales to fit the law and reduce prediction error. Finally, we apply our method to Mixture-of-Experts models with up to 7B-A150M parameters to fit the law, and verify the optimal mixture derived from the law by extrapolating to a 55B-A1.2B target model. Compared to prior methods, we reduces mixture optimization costs by 50\% and improves downstream benchmark performance by up to 3\%.
Abstract（参考訳）: データミックスとは、異なるデータソースをどのように組み合わせて、大規模な言語モデルをトレーニングするかを指し、効果的なミックスを選択することは、下流のパフォーマンスを最適にするために不可欠である。既存の手法は、ターゲットモデルに直接費用のかかる検索を行うか、大きなモデルサイズによく当てはまらない混合スケーリング法則に依存する。データ混合スケーリングのための計算効率のよいパイプラインを導入することで、これらの制限に対処する。まず,キャパシティを考慮した混合法則であるCAMELを提案する。また、検証損失からベンチマーク精度を推定し、目標モデルに対するエンドツーエンドのパフォーマンス予測を可能にする損失・ベンチマーク予測法も導入する。次に,法則に適合し,予測誤差を低減するために,モデルスケールにまたがる固定計算予算を割り当てる方法について検討する。最後に,法則に適合する最大7B-A150Mパラメータを持つMixture-of-Expertsモデルに適用し,55B-A1.2Bターゲットモデルに外挿することで法則から導出される最適混合を検証した。従来の手法と比較して、混合最適化コストを50%削減し、ダウンストリームベンチマークのパフォーマンスを最大3倍改善する。

論文の概要: Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization

関連論文リスト