Fugu-MT 論文翻訳(概要): From Local to Global: Revisiting Structured Pruning Paradigms for Large Language Models

論文の概要: From Local to Global: Revisiting Structured Pruning Paradigms for Large Language Models

arxiv url: http://arxiv.org/abs/2510.18030v1
Date: Mon, 20 Oct 2025 19:04:09 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 03:08:12.528472
Title: From Local to Global: Revisiting Structured Pruning Paradigms for Large Language Models
Title（参考訳）: ローカルからグローバルへ:大規模言語モデルのための構造化プルーニングパラダイムの再検討
Authors: Ziyan Wang, Enmao Diao, Qi Le, Pu Wang, Minwoo Lee, Shu-ping Yeh, Evgeny Stupachenko, Hao Feng, Li Yang,
Abstract要約: GISP-Global Iterative Structured Pruningは、ブロックワイド正規化により、構造レベルで集約された1次、損失ベースの重要な重み付けを用いて、注目ヘッドとチャネルを除去する。反復的なスケジュールは、ワンショットプルーニングではなく、高い間隔で精度を安定させ、中間微調整を必要とせず、パープレキシティの崩壊を緩和する。重要度はモデルレベルの損失によって定義されるため、GISPはタスク固有の目的を自然にサポートしている。
参考スコア（独自算出の注目度）: 27.774067682004745
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Structured pruning is a practical approach to deploying large language models (LLMs) efficiently, as it yields compact, hardware-friendly architectures. However, the dominant local paradigm is task-agnostic: by optimizing layer-wise reconstruction rather than task objectives, it tends to preserve perplexity or generic zero-shot behavior but fails to capitalize on modest task-specific calibration signals, often yielding limited downstream gains. We revisit global structured pruning and present GISP-Global Iterative Structured Pruning-a post-training method that removes attention heads and MLP channels using first-order, loss-based important weights aggregated at the structure level with block-wise normalization. An iterative schedule, rather than one-shot pruning, stabilizes accuracy at higher sparsity and mitigates perplexity collapse without requiring intermediate fine-tuning; the pruning trajectory also forms nested subnetworks that support a "prune-once, deploy-many" workflow. Furthermore, because importance is defined by a model-level loss, GISP naturally supports task-specific objectives; we instantiate perplexity for language modeling and a margin-based objective for decision-style tasks. Extensive experiments show that across Llama2-7B/13B, Llama3-8B, and Mistral-0.3-7B, GISP consistently lowers WikiText-2 perplexity and improves downstream accuracy, with especially strong gains at 40-50% sparsity; on DeepSeek-R1-Distill-Llama-3-8B with GSM8K, task-aligned calibration substantially boosts exact-match accuracy.
Abstract（参考訳）: 構造化プルーニング(Structured pruning)は、大規模言語モデル(LLM)を効率的にデプロイするための実践的なアプローチである。しかし、主な局所パラダイムはタスク非依存であり、タスク目的よりも階層的再構築を最適化することで、パープレキシティや汎用的なゼロショット動作を保ちがちであるが、控えめなタスク固有のキャリブレーション信号に乗じることに失敗し、しばしば下流の利得が制限される。我々は,グローバルな構造化プルーニングとGISP-グローバルな反復的構造的プルーニングについて再検討する。これは,ブロックワイド正規化による構造レベルで集約された1次的損失に基づく重要な重み付けを用いて,アテンションヘッドとMPPチャネルを除去するポストトレーニング手法である。反復的なスケジュールは、ワンショットプルーニングではなく、高い間隔で精度を安定させ、中間的な微調整を必要とせずにパープレキシティの崩壊を緩和する。さらに、重要度はモデルレベルの損失によって定義されるので、GISPはタスク固有の目的を自然にサポートします。広範囲にわたる実験の結果、Llama2-7B/13B、Llama3-8B、Mistral-0.3-7Bにおいて、GISPはWikiText-2の難易度を一貫して低下させ、下流の精度を向上し、特に40-50%の精度で向上した。

論文の概要: From Local to Global: Revisiting Structured Pruning Paradigms for Large Language Models

関連論文リスト