Fugu-MT 論文翻訳(概要): Always Learning, Always Mixing: Efficient and Simple Data Mixing All The Time

論文の概要: Always Learning, Always Mixing: Efficient and Simple Data Mixing All The Time

arxiv url: http://arxiv.org/abs/2605.15220v1
Date: Wed, 13 May 2026 02:29:19 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-18 17:44:16.259807
Title: Always Learning, Always Mixing: Efficient and Simple Data Mixing All The Time
Title（参考訳）: 常に学び、常に混ざり合う - 効率的でシンプルなデータ混在
Authors: Michael Y. Hu, Apurva Gandhi, Kyunghyun Cho, Tal Linzen, Pratyusha Sharma,
Abstract要約: OP-Mixは、言語モデルトレーニングライフサイクル全体にわたって動作するデータミキシングアルゴリズムである。プレトレーニングでは、OP-Mixは平均パープレキシティの6.3%を混合することなくトレーニングを改善できる。連続学習では、OP-Mixは再学習とオンライン蒸留の両方のパフォーマンスを66%、全体の95%で比較した。
参考スコア（独自算出の注目度）: 51.671620992989375
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Data mixing decides how to combine different sources or types of data and is a consequential problem throughout language model training. In pretraining, data composition is a key determinant of model quality; in continual learning and adaptation, it governs what is retained and acquired. Yet existing data mixing methods address only one phase of this lifecycle at a time: some require smaller proxy models tied to a single training phase, others assume a fixed domain set, and continual learning lacks principled guidance altogether. We argue that data mixing is fundamentally an online decision making problem -- one that recurs throughout training and demands a single, unified solution. We introduce OP-Mix (On-Policy Mix), a data mixing algorithm that operates across the entire language model training lifecycle. Our main insight is that candidate data mixtures can be cheaply simulated by interpolating between low-rank adapters trained directly on the current model, eliminating separate proxy models and ensuring the search is always grounded in the model's actual learning dynamics. Across pretraining, continual midtraining, and continual instruction tuning, OP-Mix consistently finds near-optimal mixtures while using a fraction of the compute of the baselines. In pretraining, OP-Mix improves upon training without mixing by 6.3% in average perplexity. For continual learning, OP-Mix matches the performance of both retraining and on-policy distillation while using 66% and 95% less overall compute, respectively. OP-Mix suggests a different view of language model training: not a sequence of distinct phases, but a single continuous process of learning from data.
Abstract（参考訳）: データミキシングは、異なるソースまたはタイプのデータを組み合わせる方法を決定する。事前学習において、データ構成はモデル品質の重要な決定要因であり、連続的な学習と適応において、保持および取得されるものを管理する。しかし、既存のデータミキシングメソッドは、このライフサイクルの1つのフェーズに一度に対処するだけである。一部には、単一のトレーニングフェーズに結びついたより小さなプロキシモデルを必要とするものや、固定されたドメインセットを前提とするものもある。データミキシングは基本的にオンライン意思決定の問題であり、トレーニングを通じて再帰し、単一の統一されたソリューションを要求するものだ、と私たちは主張しています。 OP-Mix(On-Policy Mix)は、言語モデルトレーニングライフサイクル全体にわたって動作するデータミキシングアルゴリズムである。我々の主な洞察は、候補データ混合は、現在のモデルで直接訓練された低ランクアダプタ間を補間し、別々のプロキシモデルを排除し、探索がモデルの実際の学習力学に常に根ざされていることを保証することで、安価にシミュレートできるということである。 OP-Mixは、事前学習、連続的な中等教育、連続的な指導のチューニングを通じて、ベースラインの計算の一部を使用しながら、常に最適に近い混合を見つける。プレトレーニングでは、OP-Mixは平均パープレキシティの6.3%を混合することなくトレーニングを改善できる。連続学習では, OP-Mix は再学習とオンライン蒸留の両方のパフォーマンスに一致し, 全体の66% と95% を削減した。 OP-Mixは、異なるフェーズのシーケンスではなく、データから学習する単一の継続的プロセスである。

論文の概要: Always Learning, Always Mixing: Efficient and Simple Data Mixing All The Time

関連論文リスト