Fugu-MT 論文翻訳(概要): Star Elastic: Many-in-One Reasoning LLMs with Efficient Budget Control

論文の概要: Star Elastic: Many-in-One Reasoning LLMs with Efficient Budget Control

arxiv url: http://arxiv.org/abs/2605.07182v1
Date: Fri, 08 May 2026 03:20:36 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-11 19:43:38.77348
Title: Star Elastic: Many-in-One Reasoning LLMs with Efficient Budget Control
Title（参考訳）: 星の弾性: 効率的な予算制御を施した多対一共振LDM
Authors: Ali Taghibakhshi, Ruisi Cai, Saurav Muralidharan, Sharath Turuvekere Sreenivas, Aditya Vavre, Ameya Sunil Mahabaleshwarkar, Bilal Kartal, Sheldon Liang, Marcin Chochowski, Zijia Chen, Akhiad Bercovich, Ran Zilberstein, Ran El-Yaniv, Yonatan Geifman, Daniel Korzekwa, Yoshi Suhara, Oluwatobi Olabiyi, Ashwath Aithal, Nima Tajbakhsh, Pavlo Molchanov,
Abstract要約: 大規模言語モデル(LLM)のための新しいポストトレーニング手法であるStar Elasticを紹介する。 Star Elasticは、ある親の推論モデルにNのネストされたサブモデルを追加し、1回のトレーニング後のジョブで1回の実行(N-fold saves)を実行する。 The Nemotron Elastic framework, we apply Star Elastic to the NVIDIA Nemotron Nano model。
参考スコア（独自算出の注目度）: 27.041571161298688
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Training a family of large language models (LLMs), either from scratch or via iterative compression, is prohibitively expensive and inefficient, requiring separate training runs for each model in the family. In this paper, we introduce Star Elastic, a novel LLM post-training method that adds N nested submodels to a given parent reasoning model using the compute of one run (N-fold savings) via a single post-training job. Beyond reducing training costs, Star Elastic also addresses a fundamental limitation of efficient reasoning: the rigidity of static architectures, which forces the allocation of constant resources regardless of token difficulty. By unlocking elastic budget control, Star Elastic enables a novel inference scheme that uses different submodels for each reasoning phase (thinking and answering). Star Elastic supports (1) nesting along the SSM, embedding channel, MoE, and FFN axes, (2) learning nested submodels via an end-to-end trainable router, and (3) curriculum-based knowledge distillation. Building on the Nemotron Elastic framework, we apply Star Elastic to the NVIDIA Nemotron Nano models, with a particular focus on hybrid Mixture-of-Experts (MoE) architectures: from Nemotron Nano v3 (30B/3.6A), we generate 23B (2.8A) and 12B (2.0A) variants with 160B training tokens. All nested models match or outperform independently trained baselines of comparable size and achieve a 360x reduction versus pretraining from scratch and a 7x reduction over state-of-the-art compression. Crucially, elastic budget control advances the accuracy-latency Pareto frontier, achieving up to 16% higher accuracy and 1.9x lower latency via dynamic per-phase model selection. We further extend Star Elastic to quantized regimes via Quantization-Aware Distillation (QAD), producing nested NVFP4 and FP8 elastic checkpoints that preserve zero-shot slicing while delivering smaller deployment footprints.
Abstract（参考訳）: 大規模な言語モデルのファミリー(LLM)をスクラッチから、あるいは反復圧縮によって訓練することは違法に高価で非効率であり、家族内のモデルごとに個別のトレーニングを実行する必要がある。本稿では,1回のラン(N-fold saves)を1回のポストトレーニングジョブで計算し,N個のネスト付きサブモデルを与えられた親推論モデルに追加する,新しいLCMポストトレーニング手法であるStar Elasticを紹介する。トレーニングコストの削減に加えて、Star Elasticは効率的な推論の基本的な制限、すなわち静的アーキテクチャの剛性にも対処している。弾力性のある予算制御をアンロックすることで、Star Elasticは推論フェーズごとに異なるサブモデルを使用する新しい推論スキームを実現する。 Star Elasticは、(1)SSMに沿ったネスト、(2)埋め込みチャネル、MoE、FFN軸、(2)エンドツーエンドのトレーニング可能なルータによるネストされたサブモデル学習、(3)カリキュラムベースの知識蒸留をサポートする。 Nemotron Elasticフレームワーク上に構築されたStar ElasticをNVIDIA Nemotron Nanoモデルに適用し、Nemotron Nano v3 (30B/3.6A)から、160Bのトレーニングトークンを持つ23B (2.8A)と12B (2.0A)の変種を生成するハイブリッドMixture-of-Experts (MoE)アーキテクチャに特に焦点をあてる。すべてのネストされたモデルは、同等の大きさの独立に訓練されたベースラインと一致または性能を向上し、スクラッチから事前トレーニングするよりも360倍の削減と、最先端の圧縮よりも7倍の削減を達成する。重要なことは、弾力性のある予算制御が精度の高いParetoフロンティアを前進させ、16%の精度と1.9倍のレイテンシを動的にフェーズ単位のモデル選択によって達成する。さらに、Star Elasticを量子化対応蒸留(QAD)により量子化されたレシエーションに拡張し、ネストしたNVFP4とFP8の弾性チェックポイントを生成し、ゼロショットスライシングを維持しながら、より小さなデプロイメントフットプリントを提供する。

論文の概要: Star Elastic: Many-in-One Reasoning LLMs with Efficient Budget Control

関連論文リスト