Fugu-MT 論文翻訳(概要): Layer as Puzzle Pieces: Compressing Large Language Models through Layer Concatenation

論文の概要: Layer as Puzzle Pieces: Compressing Large Language Models through Layer Concatenation

arxiv url: http://arxiv.org/abs/2510.15304v1
Date: Fri, 17 Oct 2025 04:27:06 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-20 20:17:34.471752
Title: Layer as Puzzle Pieces: Compressing Large Language Models through Layer Concatenation
Title（参考訳）: ファズルピアースとしての層:層結合による大規模言語モデル圧縮
Authors: Fei Wang, Li Shen, Liang Ding, Chao Xue, Ye Liu, Changxing Ding,
Abstract要約: 大きな言語モデルは自然言語処理タスクで優れていますが、その巨大なサイズは高い計算量とストレージ要求をもたらします。近年の研究では, 層状プルーニングによるモデルサイズ削減が試みられている。我々は、構造化プルーニングパラダイムを再検討し、いくつかの重要な制限を明らかにした。
参考スコア（独自算出の注目度）: 43.822941944402544
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models excel at natural language processing tasks, but their massive size leads to high computational and storage demands. Recent works have sought to reduce their model size through layer-wise structured pruning. However, they tend to ignore retaining the capabilities in the pruned part. In this work, we re-examine structured pruning paradigms and uncover several key limitations: 1) notable performance degradation due to direct layer removal, 2) incompetent linear weight layer aggregation, and 3) the lack of effective post-training recovery mechanisms. To address these limitations, we propose CoMe, including a progressive layer pruning framework with a Concatenation-based Merging technology and a hierarchical distillation post-training process. Specifically, we introduce a channel sensitivity metric that utilizes activation intensity and weight norms for fine-grained channel selection. Subsequently, we employ a concatenation-based layer merging method to fuse the most critical channels across adjacent layers, enabling progressive model size reduction. Finally, we propose a hierarchical distillation protocol that leverages the correspondences between the original and pruned model layers established during pruning, thereby enabling efficient knowledge transfer. Experiments on seven benchmarks show that CoMe achieves state-of-the-art performance; when pruning 30% of LLaMA-2-7b's parameters, the pruned model retains 83% of its original average accuracy. Our code is available at https://github.com/MPI-Lab/CoMe.
Abstract（参考訳）: 大きな言語モデルは自然言語処理タスクで優れていますが、その巨大なサイズは高い計算量とストレージ要求をもたらします。近年の研究では, 層状プルーニングによるモデルサイズ削減が試みられている。しかし、彼らは刈り取られた部分の能力を無視する傾向がある。本研究では,構造化プルーニングパラダイムを再検討し,いくつかの重要な制約を明らかにする。 1)直接層除去による顕著な性能劣化 2)無能な線形重み層凝集,及び 3) 効果的な訓練後回復メカニズムの欠如。これらの制約に対処するため,コンカシネーションベースのメルジング技術を用いたプログレッシブ・レイヤ・プルーニング・フレームワークと,階層的蒸留後訓練プロセスを含むCoMeを提案する。具体的には、活性化強度と重み基準を利用して細粒度チャネル選択を行うチャネル感度指標を提案する。その後、結合型層融合法を用いて、隣り合う層に最も重要なチャネルを融合させ、プログレッシブなモデルサイズ削減を実現する。最後に, プルーニング時に確立したモデル層とプルーニング層との対応を利用して, 効率的な知識伝達を実現する階層型蒸留プロトコルを提案する。 7つのベンチマークの実験では、CoMeは最先端のパフォーマンスを達成しており、LLaMA-2-7bのパラメータの30%をプルーニングすると、プルーニングされたモデルは元の平均精度の83%を維持している。私たちのコードはhttps://github.com/MPI-Lab/CoMe.comで公開されています。

論文の概要: Layer as Puzzle Pieces: Compressing Large Language Models through Layer Concatenation

関連論文リスト