Fugu-MT 論文翻訳(概要): The Structural Scalpel: Automated Contiguous Layer Pruning for Large Language Models

論文の概要: The Structural Scalpel: Automated Contiguous Layer Pruning for Large Language Models

arxiv url: http://arxiv.org/abs/2510.23652v1
Date: Sat, 25 Oct 2025 16:40:17 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-29 15:35:36.31461
Title: The Structural Scalpel: Automated Contiguous Layer Pruning for Large Language Models
Title（参考訳）: 構造スカルペル:大規模言語モデルのための連続層プルーニングの自動化
Authors: Yao Lu, Yuqi Li, Wenbin Xie, Shanqing Yu, Qi Xuan, Zhaowei Zhu, Shiping Wen,
Abstract要約: 大規模言語モデルのための新しい連続層プルーニングフレームワークであるCLPを提案する。 CLPは、プルーニングに最適な連続層セグメントを自動的に識別する、微分可能な凹面ゲートアルゴリズムを使用している。 CLPは量子化とシームレスに結合して、わずかな性能損失だけでモデルをさらに圧縮することができる。
参考スコア（独自算出の注目度）: 33.90597962418094
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although large language models (LLMs) have achieved revolutionary breakthroughs in many fields, their large model size and high computational cost pose significant challenges for practical deployment on resource-constrained edge devices. To this end, layer pruning has been proposed to reduce the computational overhead by directly removing redundant layers. However, existing layer pruning methods typically rely on hand-crafted metrics to evaluate and remove individual layers, while ignoring the dependencies between layers. This can disrupt the model's information flow and severely degrade performance. To address these issues, we propose CLP, a novel continuous layer pruning framework that introduces two key innovations: a differentiable concave gate algorithm that automatically identifies the best continuous layer segments for pruning via gradient-based optimization; and a cutoff endpoint tuning strategy that effectively restores model performance by fine-tuning only the layers adjacent to the pruned segments. Extensive experiments across multiple model architectures (including LLaMA2, LLaMA3 and Qwen) and sizes (from $7$B to $70$B parameters) show that CLP significantly outperforms existing state-of-the-art baselines. For example, at a pruning rate of $20\%$, CLP achieves an average performance retention of $95.34\%$ on LLaMA3-70B, outperforming baselines by $4.29\%$-$30.52\%$. Furthermore, CLP can be seamlessly combined with quantization to further compress the model with only a slight performance loss.
Abstract（参考訳）: 大規模言語モデル(LLM)は多くの分野で画期的なブレークスルーを達成しているが、その大きなモデルサイズと高い計算コストは、リソース制約されたエッジデバイスへの実践的な展開に重大な課題をもたらす。この目的のために, 余剰層を直接除去することで計算オーバーヘッドを低減するために, 層プルーニングが提案されている。しかしながら、既存のレイヤプルーニングメソッドは一般的に、個々のレイヤの評価と削除に手作りのメトリクスを頼りながら、レイヤ間の依存関係を無視します。これにより、モデルの情報フローが破壊され、パフォーマンスが著しく低下する可能性がある。これらの問題に対処するために,CLP は2つの重要な革新をもたらす新しい連続層プルーニングフレームワークである。これは,勾配に基づく最適化によるプルーニングにおける最良の連続層セグメントを自動的に識別する微分可能な凹面ゲートアルゴリズムと,プルーニングされたセグメントに隣接する層のみを微調整することでモデル性能を効果的に回復するカットオフエンドポイントチューニング戦略である。複数のモデルアーキテクチャ(LLaMA2、LLaMA3、Qwenを含む)とサイズ(7ドルBから70ドルBまで)にわたる大規模な実験では、CLPが既存の最先端のベースラインを大幅に上回っていることが示されている。例えば、20 %のプルーニングレートで、CLP は LLaMA3-70B 上での平均的なパフォーマンス保持率 95.34 % を達成し、ベースラインを 4.29 %$-30.52 % で上回っている。さらに、CLPと量子化をシームレスに組み合わせることで、わずかな性能損失でモデルをさらに圧縮することができる。

論文の概要: The Structural Scalpel: Automated Contiguous Layer Pruning for Large Language Models

関連論文リスト