Fugu-MT 論文翻訳(概要): LLaDA2.0: Scaling Up Diffusion Language Models to 100B

論文の概要: LLaDA2.0: Scaling Up Diffusion Language Models to 100B

arxiv url: http://arxiv.org/abs/2512.15745v1
Date: Wed, 10 Dec 2025 09:26:18 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-19 18:10:31.650386
Title: LLaDA2.0: Scaling Up Diffusion Language Models to 100B
Title（参考訳）: LLaDA2.0: 拡散言語モデルを100億ドルにスケールアップ
Authors: Tiwei Bie, Maosong Cao, Kun Chen, Lun Du, Mingliang Gong, Zhuochen Gong, Yanmei Gu, Jiaqi Hu, Zenan Huang, Zhenzhong Lan, Chengxi Li, Chongxuan Li, Jianguo Li, Zehuan Li, Huabin Liu, Ling Liu, Guoshan Lu, Xiaocheng Lu, Yuxin Ma, Jianfeng Tan, Lanning Wei, Ji-Rong Wen, Yipeng Xing, Xiaolu Zhang, Junbo Zhao, Da Zheng, Jun Zhou, Junlin Zhou, Zhanchao Zhou, Liwang Zhu, Yihong Zhuang,
Abstract要約: LLaDA2.0 - 離散拡散大言語モデル(dLLM)を100億の総パラメータにスケールアップする。 LLaDA2.0は知識継承、進歩的適応、効率性に配慮した設計原則を支持している。 LLaDA2.0-mini (16B) と LLaDA2.0-flash (100B) の2つの命令調整型Mixture-of-Experts (MoE) が実用的展開に最適化されている。
参考スコア（独自算出の注目度）: 96.84156938318931
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents LLaDA2.0 -- a tuple of discrete diffusion large language models (dLLM) scaling up to 100B total parameters through systematic conversion from auto-regressive (AR) models -- establishing a new paradigm for frontier-scale deployment. Instead of costly training from scratch, LLaDA2.0 upholds knowledge inheritance, progressive adaption and efficiency-aware design principle, and seamless converts a pre-trained AR model into dLLM with a novel 3-phase block-level WSD based training scheme: progressive increasing block-size in block diffusion (warm-up), large-scale full-sequence diffusion (stable) and reverting back to compact-size block diffusion (decay). Along with post-training alignment with SFT and DPO, we obtain LLaDA2.0-mini (16B) and LLaDA2.0-flash (100B), two instruction-tuned Mixture-of-Experts (MoE) variants optimized for practical deployment. By preserving the advantages of parallel decoding, these models deliver superior performance and efficiency at the frontier scale. Both models were open-sourced.
Abstract（参考訳）: 本稿では,自己回帰(AR)モデルからの体系的変換を通じて,最大100Bのパラメータをスケールする離散拡散大言語モデル(dLLM)のタプルであるLLaDA2.0について述べる。 LLaDA2.0は、スクラッチからコストのかかるトレーニングではなく、知識継承、プログレッシブ適応、効率を意識した設計原則を支持し、新しい3フェーズブロックレベルのWSDベースのトレーニングスキームで、事前訓練されたARモデルをdLLMにシームレスに変換する。 SFT と DPO との訓練後アライメントと合わせて LLaDA2.0-mini (16B) と LLaDA2.0-flash (100B) を得る。並列デコーディングの利点を保ちながら、これらのモデルはフロンティアスケールで優れた性能と効率を提供する。どちらのモデルもオープンソース化された。

論文の概要: LLaDA2.0: Scaling Up Diffusion Language Models to 100B

関連論文リスト