Fugu-MT 論文翻訳(概要): SPAP: Structured Pruning via Alternating Optimization and Penalty Methods

論文の概要: SPAP: Structured Pruning via Alternating Optimization and Penalty Methods

arxiv url: http://arxiv.org/abs/2505.03373v1
Date: Tue, 06 May 2025 09:47:53 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-07 18:50:11.314513
Title: SPAP: Structured Pruning via Alternating Optimization and Penalty Methods
Title（参考訳）: SPAP: 代替最適化とペナルティ法による構造化プルーニング
Authors: Hanyu Hu, Xiaoming Yuan,
Abstract要約: 大規模言語モデル(LLM)は、しばしば計算とメモリの要求によって制約される。最適化理論に基づくLLMのための新規かつ効率的な構造化プルーニングフレームワークであるSPAP(Structured Pruning via Alternating Optimization and Penalty Methods)を提案する。我々の研究は、モデル性能を保ちながらLLMを刈り取るための実用的で最適化駆動のソリューションを提供する。
参考スコア（独自算出の注目度）: 2.1388885579612804
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The deployment of large language models (LLMs) is often constrained by their substantial computational and memory demands. While structured pruning presents a viable approach by eliminating entire network components, existing methods suffer from performance degradation, reliance on heuristic metrics, or expensive finetuning. To address these challenges, we propose SPAP (Structured Pruning via Alternating Optimization and Penalty Methods), a novel and efficient structured pruning framework for LLMs grounded in optimization theory. SPAP formulates the pruning problem through a mixed-integer optimization model, employs a penalty method that effectively makes pruning decisions to minimize pruning errors, and introduces an alternating minimization algorithm tailored to the splittable problem structure for efficient weight updates and performance recovery. Extensive experiments on OPT, LLaMA-3/3.1/3.2, and Qwen2.5 models demonstrate SPAP's superiority over state-of-the-art methods, delivering linear inference speedups (1.29$\times$ at 30% sparsity) and proportional memory reductions. Our work offers a practical, optimization-driven solution for pruning LLMs while preserving model performance.
Abstract（参考訳）: 大規模言語モデル(LLM)の展開は、しばしば計算とメモリの要求によって制約される。構造化プルーニングは、ネットワークコンポーネント全体を排除して実行可能なアプローチを提供するが、既存の手法はパフォーマンスの劣化、ヒューリスティックなメトリクスへの依存、高価な微調整に悩まされている。これらの課題に対処するために,最適化理論に基づくLLMのための新規かつ効率的な構造化プルーニングフレームワークであるSPAP(Structured Pruning via Alternating Optimization and Penalty Methods)を提案する。 SPAPは、混合整数最適化モデルを用いてプルーニング問題を定式化し、プルーニングエラーを最小限に抑えるためにプルーニング決定を効果的に行うペナルティ手法を採用し、より効率的な重量更新と性能回復のために分割可能な問題構造に適合した交代最小化アルゴリズムを導入する。 OPT、LLaMA-3/3.1/3.2、Qwen2.5モデルに対する大規模な実験は、SPAPが最先端の手法よりも優れていることを示した。我々の研究は、モデル性能を保ちながらLLMを刈り取るための実用的で最適化駆動のソリューションを提供する。

論文の概要: SPAP: Structured Pruning via Alternating Optimization and Penalty Methods

関連論文リスト