Fugu-MT 論文翻訳(概要): PrunePath: Towards Highly Structured Sparse Language Models

論文の概要: PrunePath: Towards Highly Structured Sparse Language Models

arxiv url: http://arxiv.org/abs/2605.28283v1
Date: Wed, 27 May 2026 10:29:52 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-28 17:38:55.976417
Title: PrunePath: Towards Highly Structured Sparse Language Models
Title（参考訳）: PrunePath: 高度に構造化されたスパース言語モデルを目指して
Authors: Zhexuan Gu, Zixun Fu, Yancheng Yuan,
Abstract要約: FFN層のための予算適応型構造化スペーシングフレームワークである textbfPrunePath を紹介する。 PrunePathは、独立の専門家による閾値設定をソフトマックス正規化ルーティングディストリビューションに置き換える。 NLU、NLG、命令チューニング評価の他、PrunePathは好適なスパシティ-パフォーマンストレードオフを実現している。
参考スコア（独自算出の注目度）: 8.390447915838122
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Feed-forward networks (FFNs) dominate the parameter count and computation of modern language models, yet existing pruning methods often struggle to convert sparsity into hardware-friendly inference efficiency gains. We introduce \textbf{PrunePath}, a budget-adaptive structured sparsification framework for FFN layers. Built on MoEfication, PrunePath replaces independent expert-wise thresholding with a softmax-normalized routing distribution and activates important experts under a cumulative-mass threshold. This formulation imposes a token-level probability budget, enabling adaptive expert counts and a direct inference-time sparsity knob from a single checkpoint. Across NLU, NLG, and instruction-tuning evaluations, PrunePath achieves a favorable sparsity--performance trade-off compared with existing static pruning and MoEfication-based methods. We further implement Triton kernels for KV-cache decoding to translate the resulting structured sparsity into practical memory savings and measurable decoding-speed improvements. These results demonstrate the superior performance of PrunePath for building highly sparse, deployment-friendly large language models.
Abstract（参考訳）: フィードフォワードネットワーク(FFN)は、現代の言語モデルのパラメータ数と計算を支配しているが、既存のプルーニング手法は、スパーシティをハードウェアフレンドリーな推論効率向上に変換するのに苦労することが多い。 FFN層のための予算適応型構造化スカラー化フレームワークである \textbf{PrunePath} を紹介する。 MoEfication上に構築されたPrunePathは、独立した専門家の閾値設定をソフトマックス正規化ルーティング分布に置き換え、累積質量閾値の下で重要な専門家を活性化する。この定式化はトークンレベルの確率予算を課し、適応的な専門家数と1つのチェックポイントからの直接推論時間間隔ノブを可能にする。 NLU, NLG, 命令調整評価の他, PrunePathは, 既存の静的プルーニングやMoEficationベースの手法と比較して, 性能トレードオフを良好に実現している。さらに、KV-cacheデコードのためのTritonカーネルを実装し、結果として得られる構造的疎結合を実用的なメモリセーブと測定可能なデコード高速化に変換する。これらの結果は、PrunePathが高度にスパースでデプロイしやすい大規模言語モデルを構築する上で、優れたパフォーマンスを示している。

論文の概要: PrunePath: Towards Highly Structured Sparse Language Models

関連論文リスト