Fugu-MT 論文翻訳(概要): Z-Pruner: Post-Training Pruning of Large Language Models for Efficiency without Retraining

論文の概要: Z-Pruner: Post-Training Pruning of Large Language Models for Efficiency without Retraining

arxiv url: http://arxiv.org/abs/2508.15828v1
Date: Mon, 18 Aug 2025 16:19:22 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-25 16:42:36.097248
Title: Z-Pruner: Post-Training Pruning of Large Language Models for Efficiency without Retraining
Title（参考訳）: Z-Pruner: 大規模な言語モデルのトレーニング後プルーニング
Authors: Samiul Basir Bhuiyan, Md. Sazzad Hossain Adib, Mohammed Aman Bhuiyan, Muhammad Rafsan Kabir, Moshiur Farazi, Shafin Rahman, Nabeel Mohammed,
Abstract要約: トレーニング後のプルーニングは、再トレーニングを必要とせずに、モデルサイズと推論レイテンシを低減するための有望なアプローチである。 Z-Prunerは,事前学習された大規模言語モデルにおいて,再学習を伴わずにスパーシリティを誘導するために設計された,新しい訓練後プルーニング手法である。 Z-Prunerは、重量の集中的な更新を必要とする最先端のプルーニング手法を超越している。
参考スコア（独自算出の注目度）: 6.578456055730258
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have rapidly advanced in recent years, achieving remarkable performance across a wide range of natural language processing tasks. However, this progress has come at the cost of increasingly large model sizes, which pose significant challenges for deployment, scalability, and energy efficiency. To address these limitations, post-training pruning has emerged as a promising approach for reducing model size and inference latency without the need for retraining. Despite these advantages, many existing pruning methods result in substantial performance degradation or require computationally expensive fine-tuning. In this work, we introduce Z-Pruner, a novel post-training pruning method designed to induce sparsity in pretrained LLMs without any retraining. Unlike conventional approaches, Z-Pruner leverages both weight update magnitudes and activation patterns to identify and eliminate redundant parameters more effectively. Our method is model-agnostic, efficient, and easy to implement. We evaluate Z-Pruner using multiple widely-used LLM architectures, including LLaMA-2, LLaMA-3, and OPT, across a diverse set of standard language benchmarks. Experimental results demonstrate that Z-Pruner surpasses state-of-the-art pruning methods that require intensive weight updates. Specifically, Z-Pruner achieves the lowest perplexity scores and the highest overall average score for zero-shot accuracy. We have made the corresponding codes publicly available at https://github.com/sazzadadib/Z-Pruner.
Abstract（参考訳）: 大規模言語モデル(LLM)は近年急速に進歩し、幅広い自然言語処理タスクにおいて顕著なパフォーマンスを実現している。しかし、この進歩はますます大きなモデルサイズを犠牲にしており、デプロイメント、スケーラビリティ、エネルギー効率の面で大きな課題をもたらしています。これらの制限に対処するため、トレーニング後プルーニングは、再トレーニングを必要とせずに、モデルサイズと推論レイテンシを低減するための有望なアプローチとして登場した。これらの利点にもかかわらず、多くの既存のプルーニング手法は性能を著しく低下させるか、計算に高価な微調整を必要とする。本研究では,Z-Prunerについて紹介する。Z-Prunerは,事前学習したLLMにおいて,再学習を伴わずにスパーシリティを誘導する,新しいポストトレーニングプルーニング手法である。従来のアプローチとは異なり、Z-Prunerはウェイトアップデートの規模とアクティベーションパターンの両方を活用して、冗長パラメータをより効果的に識別し排除する。提案手法は, モデルに依存しない, 効率的, 実装が容易である。 LLaMA-2, LLaMA-3, OPTを含む多種多様な標準言語ベンチマークを用いてZ-Prunerの評価を行った。実験の結果,Z-Prunerは重み更新を必要とする最先端のプルーニング法を上回ることがわかった。具体的には、Z-Prunerは、ゼロショット精度で最も低いパープレキシティスコアと最高平均スコアを達成する。対応するコードはhttps://github.com/sazzadadib/Z-Pruner.comで公開しています。

論文の概要: Z-Pruner: Post-Training Pruning of Large Language Models for Efficiency without Retraining

関連論文リスト