Fugu-MT 論文翻訳(概要): TiKMiX: Take Data Influence into Dynamic Mixture for Language Model Pre-training

論文の概要: TiKMiX: Take Data Influence into Dynamic Mixture for Language Model Pre-training

arxiv url: http://arxiv.org/abs/2508.17677v1
Date: Mon, 25 Aug 2025 05:18:32 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-26 18:43:45.644356
Title: TiKMiX: Take Data Influence into Dynamic Mixture for Language Model Pre-training
Title（参考訳）: TiKMiX: 言語モデル事前学習のための動的混合へのデータの影響
Authors: Yifan Wang, Binbin Liu, Fengze Liu, Yuanfan Guo, Jiyao Deng, Xuecheng Wu, Weidong Zhou, Xiaohuan Zhou, Taifeng Wang,
Abstract要約: TiKMiXは、モデルの進化する好みに応じてデータ混合物を動的に調整する手法である。グループインフルエンス(Group Influence)は、データドメインがモデルに与える影響を評価するための効果的な指標である。
参考スコア（独自算出の注目度）: 15.314880713541873
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The data mixture used in the pre-training of a language model is a cornerstone of its final performance. However, a static mixing strategy is suboptimal, as the model's learning preferences for various data domains shift dynamically throughout training. Crucially, observing these evolving preferences in a computationally efficient manner remains a significant challenge. To address this, we propose TiKMiX, a method that dynamically adjusts the data mixture according to the model's evolving preferences. TiKMiX introduces Group Influence, an efficient metric for evaluating the impact of data domains on the model. This metric enables the formulation of the data mixing problem as a search for an optimal, influence-maximizing distribution. We solve this via two approaches: TiKMiX-D for direct optimization, and TiKMiX-M, which uses a regression model to predict a superior mixture. We trained models with different numbers of parameters, on up to 1 trillion tokens. TiKMiX-D exceeds the performance of state-of-the-art methods like REGMIX while using just 20% of the computational resources. TiKMiX-M leads to an average performance gain of 2% across 9 downstream benchmarks. Our experiments reveal that a model's data preferences evolve with training progress and scale, and we demonstrate that dynamically adjusting the data mixture based on Group Influence, a direct measure of these preferences, significantly improves performance by mitigating the underdigestion of data seen with static ratios.
Abstract（参考訳）: 言語モデルの事前トレーニングで使用されるデータ混合は、その最終的なパフォーマンスの基盤となる。しかし、様々なデータドメインに対する学習の好みがトレーニングを通して動的に変化するため、静的な混合戦略は最適以下である。重要なのは、これらの進化する好みを計算的に効率よく観察することは重要な課題である。そこで本研究では,モデルの進化する嗜好に応じてデータ混合を動的に調整するTiKMiXを提案する。 TiKMiXは、モデルに対するデータドメインの影響を評価するための効率的な指標であるグループインフルエンス(Group Influence)を導入している。この計量は、最適で影響を最大化する分布の探索として、データ混合問題の定式化を可能にする。直接最適化のためのTiKMiX-Dと、優れた混合を予測するために回帰モデルを使用するTiKMiX-Mの2つのアプローチでこれを解く。最大1兆トークンで、さまざまなパラメータのモデルをトレーニングしました。 TiKMiX-D は REGMIX のような最先端の手法の性能を超え、計算資源の 20% しか使用していない。 TiKMiX-Mは、9つのダウンストリームベンチマークの平均パフォーマンスが2%向上する。実験の結果、モデルのデータの嗜好は学習の進行とスケールとともに進化し、これらの選好の直接的な尺度であるグループ影響に基づいてデータ混合を動的に調整することで、静的比で見るデータの過度な消化を軽減し、性能を著しく改善することを示した。

論文の概要: TiKMiX: Take Data Influence into Dynamic Mixture for Language Model Pre-training

関連論文リスト