Fugu-MT 論文翻訳(概要): Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

論文の概要: Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

arxiv url: http://arxiv.org/abs/2603.19220v2
Date: Sun, 22 Mar 2026 00:47:52 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-24 12:36:10.076043
Title: Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation
Title（参考訳）: ネモトロンカスケード2: カスケードRLとマルチドメインオンポリシングによる後処理LDM
Authors: Zhuolin Yang, Zihan Liu, Yang Chen, Wenliang Dai, Boxin Wang, Sheng-Chieh Lin, Chankyu Lee, Yangyi Chen, Dongfu Jiang, Jiafan He, Renjie Pi, Grace Lam, Nayeon Lee, Alexander Bukharin, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping,
Abstract要約: Nemotron-Cascade 2はオープンな30B MoEモデルで、3Bアクティベートされたパラメータを持ち、最高の推論と強力なエージェント能力を提供する。これはDeepSeekV3.2- Speciale-671B-A37Bに続く2番目のオープンウェイトLDMであり、2025年の国際数学オリンピックでゴールドメダルレベルのパフォーマンスを達成した。
参考スコア（独自算出の注目度）: 114.31258597102926
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. It is the second open-weight LLM, after DeepSeekV3.2-Speciale-671B-A37B, to achieve Gold Medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICPC World Finals, demonstrating remarkably high intelligence density with 20x fewer parameters. In contrast to Nemotron-Cascade 1, the key technical advancements are as follows. After SFT on a meticulously curated dataset, we substantially expand Cascade RL to cover a much broader spectrum of reasoning and agentic domains. Furthermore, we introduce multi-domain on-policy distillation from the strongest intermediate teacher models for each domain throughout the Cascade RL process, allowing us to efficiently recover benchmark regressions and sustain strong performance gains along the way. We release the collection of model checkpoint and training data.
Abstract（参考訳）: 3B 活性化パラメータを持つオープン 30B MoE モデルである Nemotron-Cascade 2 を導入する。コンパクトなサイズにもかかわらず、数学的およびコーディングの推論性能はフロンティアオープンモデルに近づいた。これはDeepSeekV3.2- Speciale-671B-A37Bに続く2番目のオープンウェイトLDMであり、2025年の国際数学オリンピック(IMO)、国際情報科学オリンピック(IOI)、ICPCワールドファイナルで金メダルレベルの性能を達成した。ネモトロン・カスケード1とは対照的に、重要な技術的進歩は以下の通りである。厳密にキュレートされたデータセット上でSFTを行った後、我々はCascade RLを大幅に拡張し、推論とエージェントドメインのより広い範囲をカバーする。さらに,カスケードRLプロセスを通じて,各ドメインにおいて最強の中間教師モデルから多分野のオンライン蒸留を導入し,ベンチマークの効率よく評価を回収し,その過程での強い性能向上を維持できることを示す。モデルチェックポイントとトレーニングデータのコレクションをリリースします。

論文の概要: Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

関連論文リスト