Fugu-MT 論文翻訳(概要): SLaB: Sparse-Lowrank-Binary Decomposition for Efficient Large Language Models

論文の概要: SLaB: Sparse-Lowrank-Binary Decomposition for Efficient Large Language Models

arxiv url: http://arxiv.org/abs/2604.04493v1
Date: Mon, 06 Apr 2026 07:36:48 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-07 15:49:19.135116
Title: SLaB: Sparse-Lowrank-Binary Decomposition for Efficient Large Language Models
Title（参考訳）: SLaB: Sparse-Lowrank-Binary Decomposition for Efficient Large Language Models
Authors: Ziwei Li, Yuang Ma, Yi Kang,
Abstract要約: 線形層重みを3つの相補成分に分解する新しいフレームワークであるSLaBを提案する。 SLaBは最先端のパフォーマンスを実現し、既存の手法と比較してパープレキシティを最大36%削減する。
参考スコア（独自算出の注目度）: 6.9575993729793595
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rapid growth of large language models (LLMs) presents significant deployment challenges due to their massive computational and memory demands. While model compression, such as network pruning, offers potential solutions, most existing methods often fail to maintain good performance at high compression ratios. To address this, we propose SLaB, a novel framework that decomposes each linear layer weight into three complementary components: a sparse matrix, a low-rank matrix, and a binary matrix. SLaB eliminates the need for retraining and leverages activation-aware pruning scores to guide the decomposition process. Experiments on Llama-family models demonstrate that SLaB achieves state-of-the-art performance, reducing perplexity by up to 36% compared to existing methods at 50% compression and improving accuracy by up to 8.98% over the baseline on zero-shot tasks.
Abstract（参考訳）: 大規模言語モデル(LLM)の急速な成長は、その膨大な計算とメモリ要求のために、大きなデプロイメント上の課題を生じさせる。ネットワークプルーニングのようなモデル圧縮は潜在的な解決策を提供するが、既存のほとんどの手法は高い圧縮比で優れた性能を維持するのに失敗する。そこで我々は,各線形層重みを,スパース行列,ローランク行列,バイナリ行列の3つの相補成分に分解する新しいフレームワークであるSLaBを提案する。 SLaBは再トレーニングの必要性を排除し、アクティベーション対応プルーニングスコアを活用して分解プロセスのガイドを行う。 Llama- familyモデルの実験では、SLaBが最先端の性能を達成し、圧縮50%の既存の手法と比較してパープレキシティを最大36%削減し、ゼロショットタスクのベースラインよりも最大8.98%精度を向上することを示した。

論文の概要: SLaB: Sparse-Lowrank-Binary Decomposition for Efficient Large Language Models

関連論文リスト