Fugu-MT 論文翻訳(概要): GPrune-LLM: Generalization-Aware Structured Pruning for Large Language Models

論文の概要: GPrune-LLM: Generalization-Aware Structured Pruning for Large Language Models

arxiv url: http://arxiv.org/abs/2603.13418v1
Date: Thu, 12 Mar 2026 19:20:37 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 16:19:35.18231
Title: GPrune-LLM: Generalization-Aware Structured Pruning for Large Language Models
Title（参考訳）: GPrune-LLM:大規模言語モデルのための一般化型構造化プルーニング
Authors: Xiaoyun Liu, Divya Saxena, Jiannong Cao, Yuqing Zhao, Yiying Dong, Penghui Ruan,
Abstract要約: 一般化型構造化プルーニングフレームワークであるGPrune-LLMを提案する。まず、ニューロンを行動整合モジュールに分割し、ランキング競争をローカライズする。アクティベーションベースのスコアリングが信頼できないモジュールに対しては、アクティベーション非依存メトリックに切り替える。
参考スコア（独自算出の注目度）: 17.33640761554548
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Structured pruning is widely used to compress large language models (LLMs), yet its effectiveness depends heavily on neuron importance estimation. Most existing methods estimate neuron importance from activation statistics on a single calibration dataset, which introduces calibration bias and degrades downstream cross-task generalization. We observe that neurons exhibit heterogeneous distribution sensitivity, with distribution-robust neurons maintaining consistent rankings across datasets and distribution-sensitive neurons showing high cross-dataset ranking variance. Based on this, we identify two structural limitations in existing methods. First, ranking all neurons within a shared space causes distribution-sensitive neurons that strongly activate on calibration inputs to dominate, crowding out distribution-robust neurons critical for out-of-distribution tasks. Second, applying activation-based importance metrics uniformly can be unreliable. Distribution-sensitive neurons that infrequently activate on calibration data receive insufficient activation signal for accurate local ranking. To address these limitations, we propose GPrune-LLM, a generalization-aware structured pruning framework that explicitly accounts for neuron differences in cross-distribution behavior. We first partition neurons into behavior-consistent modules to localize ranking competition, then evaluate activation-based metric reliability per module according to distribution sensitivity and score magnitude. For modules where activation-based scoring is unreliable, we switch to an activation-independent metric. Finally, we adaptively learn module-wise sparsity. Extensive experiments across multiple downstream tasks demonstrate GPrune-LLM's consistent improvements in post-compression generalization, particularly at high sparsity, and reduced dependence on importance metric choice.
Abstract（参考訳）: 構造化プルーニングは大規模言語モデル(LLM)の圧縮に広く用いられているが、その有効性はニューロンの重要度推定に大きく依存している。既存のほとんどの手法では、単一のキャリブレーションデータセット上での活性化統計からニューロンの重要性を推定し、キャリブレーションバイアスを導入し、下流のクロスタスクの一般化を低下させる。本研究は, ニューロンが不均一な分布感度を示し, 分布ローバストニューロンがデータセット間の一貫したランキングを維持し, クロスデータセットのランキングのばらつきが高い分布感度ニューロンを観察する。これに基づいて,既存手法における2つの構造的制約を同定する。まず、共有空間内の全てのニューロンをランク付けすると、キャリブレーション入力を強く活性化する分布感受性ニューロンが支配的になり、アウト・オブ・ディストリビューション・タスクに不可欠な分布障害ニューロンが群がる。第二に、アクティベーションベースの重要度を一様に適用することは信頼性が低い。キャリブレーションデータ上で頻繁に活性化される分布感受性ニューロンは、正確な局所ランク付けのために不十分な活性化信号を受け取る。これらの制約に対処するため,我々は,交叉分布行動におけるニューロンの差異を明示的に考慮した一般化対応型構造化プルーニングフレームワークGPrune-LLMを提案する。まず、ニューロンを動作一貫性のあるモジュールに分割し、ランキング競争をローカライズし、分布感度とスコアの程度に応じてモジュールごとのアクティベーションベースのメートル法信頼性を評価する。アクティベーションベースのスコアリングが信頼できないモジュールに対しては、アクティベーション非依存メトリックに切り替える。最後に,モジュール幅を適応的に学習する。複数の下流タスクにわたる大規模な実験は、圧縮後の一般化においてGPrune-LLMが一貫した改善、特に高い疎性において、重要度選択への依存を減らしたことを示す。

論文の概要: GPrune-LLM: Generalization-Aware Structured Pruning for Large Language Models

関連論文リスト