Fugu-MT 論文翻訳(概要): Scaling Categorical Flow Maps

論文の概要: Scaling Categorical Flow Maps

arxiv url: http://arxiv.org/abs/2605.07820v2
Date: Mon, 11 May 2026 13:50:18 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 19:24:01.425365
Title: Scaling Categorical Flow Maps
Title（参考訳）: カテゴリフローマップのスケーリング
Authors: Oscar Davis, Anastasiia Filippova, Pierre Ablin, Victor Turrisi, Amitis Shidani, Marco Cuturi, Louis Béthune,
Abstract要約: 連続拡散とフローマッチングモデルは、言語モデリングにおける自己回帰的なアプローチの強力な代替となるかもしれない。ほぼデータレベルのトークンエントロピーを維持しつつ,4ドル程度の推論ステップで多種多様な高品質テキストを生成する方法を示す。これらのモデルを大規模にトレーニングすることで生じる課題を明らかにし、損失重み付けと時間スケジューリングに関する規範的な洞察を提供する。
参考スコア（独自算出の注目度）: 29.25832277355597
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Continuous diffusion and flow matching models could represent a powerful alternative to autoregressive approaches for language modelling (LM), as they unlock a host of advantages currently reserved for continuous modalities, including accelerated sampling and tilting. Recently, several works have demonstrated the possibility of generating discrete data continuously by a simple flow matching process between a Gaussian and the one-hot encoded data distribution. They have further shown the feasibility of accelerated sampling via Categorical Flow Maps (CFMs), resulting in competitive sample quality in the few-step regime. However, this method had only been evaluated at relatively modest scales ($<1$B), leaving the question of its scalability completely open. In this article, we train a $1.7$B-parameter base flow model on $2.1$T tokens and self-distill it into a CFM that generates diverse, high-quality text in as few as $4$ inference steps while maintaining near-data-level token entropy. Furthermore, we introduce a likelihood bound for CFMs in the semi-discrete setting, and show that they can be used to score the model on standard LM benchmarks, achieving results in the same range as discrete diffusion methods. Finally, we uncover some of the challenges that arise from training these models at scale, and we provide prescriptive insights on loss weighting and time scheduling.
Abstract（参考訳）: 連続拡散とフローマッチングモデルは、言語モデリング(LM)に対する自己回帰的なアプローチの強力な代替となり得る。近年,ガウスアンとワンホット符号化データ分布の単純なフローマッチングプロセスにより,離散データを連続的に生成する可能性を示す研究がいくつかある。彼らはさらに、カテゴリーフローマップ (CFMs) による加速サンプリングの実現可能性を示し、その結果、数ステップ体制における競合的なサンプル品質がもたらされた。しかし、この手法は比較的控えめな規模(<1$B)でのみ評価され、スケーラビリティの問題は完全に解決された。本稿では,2.1ドルのトークンをベースフローモデルとして17ドルBパラメータをトレーニングし,それをCFMに自己分解し,データレベルのトークンエントロピーを維持しつつ,最大4ドルの推論ステップで多種多様な高品質テキストを生成する。さらに、半離散的な設定でCFMの確率境界を導入し、標準のLMベンチマークでモデルをスコアリングすることができ、離散拡散法と同じ範囲で結果が得られることを示す。最後に、これらのモデルを大規模にトレーニングすることで生じる課題を明らかにし、損失重み付けと時間スケジューリングに関する規範的な洞察を提供する。

論文の概要: Scaling Categorical Flow Maps

関連論文リスト