Fugu-MT 論文翻訳(概要): Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search

論文の概要: Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search

arxiv url: http://arxiv.org/abs/2508.15884v2
Date: Mon, 08 Sep 2025 02:44:54 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-09 14:07:03.294171
Title: Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
Title（参考訳）: Jet-Nemotron:ポストニューラルアーキテクチャ検索による効率的な言語モデル
Authors: Yuxian Gu, Qinghao Hu, Shang Yang, Haocheng Xi, Junyu Chen, Song Han, Han Cai,
Abstract要約: Jet-Nemotronはハイブリッドアーキテクチャ言語モデルの新しいファミリーである。それは、主要なフルアテンションモデルの正確さと一致または超える。
参考スコア（独自算出の注目度）: 42.46046429414803
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We present Jet-Nemotron, a new family of hybrid-architecture language models, which matches or exceeds the accuracy of leading full-attention models while significantly improving generation throughput. Jet-Nemotron is developed using Post Neural Architecture Search (PostNAS), a novel neural architecture exploration pipeline that enables efficient model design. Unlike prior approaches, PostNAS begins with a pre-trained full-attention model and freezes its MLP weights, allowing efficient exploration of attention block designs. The pipeline includes four key components: (1) learning optimal full-attention layer placement and elimination, (2) linear attention block selection, (3) designing new attention blocks, and (4) performing hardware-aware hyperparameter search. Our Jet-Nemotron-2B model achieves comparable or superior accuracy to Qwen3, Qwen2.5, Gemma3, and Llama3.2 across a comprehensive suite of benchmarks while delivering up to 53.6x generation throughput speedup and 6.1x prefilling speedup. It also achieves higher accuracy on MMLU and MMLU-Pro than recent advanced MoE full-attention models, such as DeepSeek-V3-Small and Moonlight, despite their larger scale with 15B total and 2.2B activated parameters.
Abstract（参考訳）: 提案するJet-Nemotronはハイブリッドアーキテクチャ言語モデルの新たなファミリであり、生成スループットを著しく向上させながら、先行するフルアテンションモデルの精度を一致または超過する。 Jet-NemotronはPost Neural Architecture Search (PostNAS)を使用して開発されている。従来のアプローチとは異なり、PostNASはトレーニング済みのフルアテンションモデルから始まり、MDPの重量を凍結し、注意ブロックの設計を効率的に探索することができる。パイプラインには,(1)最適全アテンション層配置と除去の学習,(2)線形アテンションブロックの選択,(3)新しいアテンションブロックの設計,(4)ハードウェア対応ハイパーパラメータサーチの実行の4つの重要なコンポーネントが含まれている。我々のJet-Nemotron-2Bモデルは、ベンチマークスイート全体でQwen3、Qwen2.5、Gemma3、Llama3.2に匹敵する精度または優れた精度を実現し、最大53.6倍のスループットと6.1倍のプリフィルスピードを提供する。 MMLUとMMLU-Proの精度は、DeepSeek-V3-SmallやMoonlightのような最近のMoEのフルアテンションモデルよりも高い。

論文の概要: Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search

関連論文リスト