Fugu-MT 論文翻訳(概要): Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models

論文の概要: Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models

arxiv url: http://arxiv.org/abs/2311.04902v1
Date: Wed, 8 Nov 2023 18:59:54 GMT
ステータス: 翻訳完了
システム内更新日: 2023-11-09 14:48:38.320767
Title: Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models
Title（参考訳）: サイズを超えて - 大規模言語モデルにおける粒度決定の方法
Authors: Rocktim Jyoti Das and Liqun Ma and Zhiqiang Shen
Abstract要約: 数十億以上のパラメータを持つ大規模言語モデル(LLM)は、ネットワークプルーニングの主要なターゲットである。グラディエントベース言語モデルプルーナー (GBLM-Pruner) と呼ばれる, プレトレーニング済みLLMに対する新しいスペーサ中心プルーニング法を提案する。 GBLM-Pruner は,大刈り,Wanda (weights+activations) およびSparseGPT (weights+activations+weight update) を大きく上回っている。
参考スコア（独自算出の注目度）: 27.488197964786806
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) with a billion or more parameters are prime targets for network pruning, which aims to reduce a portion of the network weights without compromising performance. Prior approaches such as Weights Magnitude, SparseGPT, and Wanda, either concentrated solely on weights or integrated weights with activations for sparsity. However, they overlooked the informative gradients derived from pretrained large language models. In this paper, we present a novel sparsity-centric pruning method for pretrained LLMs, termed Gradient-based Language Model Pruner (GBLM-Pruner). GBLM-Pruner leverages the first-order term of the Taylor expansion, operating in a training-free manner by harnessing properly normalized gradients from a few calibration samples to determine the importance pruning score, and substantially outperforms competitive counterparts like SparseGPT and Wanda in multiple benchmarks. Intriguing, after incorporating gradients, the unstructured pruning method tends to reveal some structural patterns post-pruning, which mirrors the geometric interdependence inherent in the LLMs' parameter structure. Additionally, GBLM-Pruner functions without any subsequent retraining or weight updates to maintain its simplicity as other counterparts. Extensive evaluations on LLaMA-1 and LLaMA-2 across various language benchmarks and perplexity show that GBLM-Pruner surpasses magnitude pruning, Wanda (weights+activations) and SparseGPT (weights+activations+weight update) by significant margins. Our code and models are available at https://github.com/RocktimJyotiDas/GBLM-Pruner.
Abstract（参考訳）: 10億以上のパラメータを持つ大規模言語モデル(llm)は、ネットワークプルーニングの主要なターゲットであり、パフォーマンスを損なうことなくネットワークの重みの一部を削減することを目的としている。ウェイトズ・マグニチュード、スパースGPT、ワンダといった以前のアプローチは、重みのみに集中するか、あるいは重み統合に重みを集中させ、スパーシティを活性化させた。しかし、事前訓練された大きな言語モデルから得られた情報的勾配を見落としていた。本稿では, グラディエントベース言語モデルプルーナー (GBLM-Pruner) と呼ばれる, プレトレーニング済みLLMに対するスペーサ中心プルーニング手法を提案する。 GBLM-PrunerはTaylor拡張の第1次項を活用し、いくつかのキャリブレーションサンプルからの正規化勾配を適切に利用して重要プルーニングスコアを決定し、複数のベンチマークでSparseGPTやWandaのような競合相手よりも大幅に優れている。興味深いことに、勾配を組み込んだ後、非構造化プルーニング法は、LLMのパラメータ構造に固有の幾何学的相互依存性を反映する、後プルーニングのいくつかの構造パターンを明らかにする傾向がある。さらにgblm-pruner関数は、その後の再トレーニングや重み付けの更新なしに、他の機能と同様にシンプルさを維持する。 LLaMA-1 と LLaMA-2 の様々な言語ベンチマークおよびパープレクティリティに対する広範囲な評価は、GBLM-Pruner が大まかなプルーニング、Wanda (weights+activations)、SparseGPT (weights+activations+weight update) をかなり上回っていることを示している。私たちのコードとモデルはhttps://github.com/rocktimjyotidas/gblm-prunerで利用可能です。

論文の概要: Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models

関連論文リスト