Fugu-MT 論文翻訳(概要): SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression

論文の概要: SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression

arxiv url: http://arxiv.org/abs/2604.03258v1
Date: Thu, 12 Mar 2026 04:01:11 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-12 18:41:08.561239
Title: SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression
Title（参考訳）: SoLA:大規模言語モデル圧縮のためのソフトアクティベーション空間の活用と低ランク分解
Authors: Xinhao Huang, You-Liang Huang, Zeyi Wen,
Abstract要約: 大規模言語モデル(LLM)は、様々なタスクにまたがる印象的な機能を示しているが、数十億のパラメータは、デプロイメントの課題を引き起こす。我々は「SoLA」という,LLMのための新しい学習自由圧縮手法を提案する。 SoLAは、後トレーニングなしで、言語モデリングと下流タスクの精度の両方において顕著に改善されている。
参考スコア（独自算出の注目度）: 14.317197422277923
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have demonstrated impressive capabilities across various tasks, but the billion-scale parameters pose deployment challenges. Although existing methods attempt to reduce the scale of LLMs, they require either special hardware support or expensive post-training to maintain model quality. To facilitate efficient and affordable model slimming, we propose a novel training-free compression method for LLMs, named "SoLA", which leverages \textbf{So}ft activation sparsity and \textbf{L}ow-r\textbf{A}nk decomposition. SoLA can identify and retain a minority of components significantly contributing to inference, while compressing the majority through low-rank decomposition, based on our analysis of the activation pattern in the feed-forward network (FFN) of modern LLMs. To alleviate the decomposition loss, SoLA is equipped with an adaptive component-wise low-rank allocation strategy to assign appropriate truncation positions for different weight matrices. We conduct extensive experiments on LLaMA-2-7B/13B/70B and Mistral-7B models across a variety of benchmarks. SoLA exhibits remarkable improvement in both language modeling and downstream task accuracy without post-training. For example, with a 30\% compression rate on the LLaMA-2-70B model, SoLA surpasses the state-of-the-art method by reducing perplexity from 6.95 to 4.44 and enhancing downstream task accuracy by 10\%.
Abstract（参考訳）: 大規模言語モデル(LLM)は、様々なタスクにまたがる印象的な機能を示しているが、数十億のパラメータは、デプロイメントの課題を引き起こす。既存の手法はLLMの規模を縮小しようとするが、モデルの品質を維持するには特別なハードウェアサポートか高価な後処理が必要である。高速で安価なモデルスリム化を容易にするため, 従来の「SoLA」と呼ばれる新しいLCMのトレーニング不要圧縮手法を提案し, アクティベーション空間の疎結合性と「textbf{L}ow-r\textbf{A}nk分解」を利用した。 SoLAは、現代のLLMのフィードフォワードネットワーク(FFN)における活性化パターンの分析に基づいて、低ランク分解により多数を圧縮しながら、推論に大きく寄与する少数の成分を同定し、保持することができる。分解損失を軽減するため、SOLAは、異なる重み行列に対して適切な乱れ位置を割り当てる適応的なコンポーネント単位の低ランク割当戦略を備えている。我々は,LLaMA-2-7B/13B/70BモデルとMistral-7Bモデルについて,様々なベンチマークで広範な実験を行った。 SoLAは、後トレーニングなしで、言語モデリングと下流タスクの精度の両方において顕著に改善されている。例えば、LLaMA-2-70Bモデルで30倍の圧縮率で、SoLAは6.95から4.44にパープレキシティを減らし、下流タスクの精度を10倍にすることで最先端の手法を上回っている。

論文の概要: SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression

関連論文リスト