Fugu-MT 論文翻訳(概要): STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

論文の概要: STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

arxiv url: http://arxiv.org/abs/2408.01803v2
Date: Tue, 8 Oct 2024 03:18:19 GMT
ステータス: 翻訳完了
システム内更新日: 2024-11-08 13:07:08.155031
Title: STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
Title（参考訳）: STBLLM: 構造付きバイナリLLMで1ビットバリアを壊す
Authors: Peijie Dong, Lujun Li, Yuedong Zhong, Dayou Du, Ruibo Fan, Yuhan Chen, Zhenheng Tang, Qiang Wang, Wei Xue, Yike Guo, Xiaowen Chu,
Abstract要約: LLM圧縮のための最初の構造双対化法を1ビット未満の精度で提案する。バイナライズされたLLMの重みは、性能劣化を伴わずにランダムに反転することができる。本手法は他の圧縮バイナライズ手法よりも優れた性能を示しながら,メモリ要求を大幅に低減する。
参考スコア（独自算出の注目度）: 28.70239743254508
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we present the first structural binarization method for LLM compression to less than 1-bit precision. Although LLMs have achieved remarkable performance, their memory-bound nature during the inference stage hinders the adoption of resource-constrained devices. Reducing weights to 1-bit precision through binarization substantially enhances computational efficiency. We observe that some weights in binarized LLMs can be randomly flipped without significant performance degradation, suggesting the potential for further compression. To exploit this, our STBLLM employs an N:M sparsity technique to achieve structural binarization of the weights. Specifically, we introduce a novel Standardized Importance (SI) metric, which considers weight magnitude and input feature norm to more accurately assess weight significance. Then, we propose a layer-wise approach, allowing different layers of the LLM to be sparsified with varying N:M ratios, thereby balancing compression and accuracy. Furthermore, we implement a fine-grained grouping strategy for less important weights, applying distinct quantization schemes to sparse, intermediate, and dense regions. Finally, we design a specialized CUDA kernel to support structural binarization. We conduct extensive experiments on LLaMA-1/2/3, OPT family, and Mistral to evaluate the effectiveness of STBLLM. The results demonstrate that our approach performs better than other compressed binarization LLM methods while significantly reducing memory requirements.
Abstract（参考訳）: 本稿では,LLM圧縮のための最初の構造バイナライズ手法を1ビット未満の精度で提案する。 LLMは目覚ましい性能を達成しているが、推論段階におけるメモリバウンドの性質は、リソース制約されたデバイスの採用を妨げる。双項化による重みを1ビット精度に減らすことは、計算効率を大幅に向上させる。両値化LLMの重み付けは性能劣化を伴わずにランダムに反転可能であることが観察され,さらなる圧縮の可能性が示唆された。これを活用するために,我々のSTBLLMでは,重みの構造的双対化を実現するため,N:Mスポーシティ技術を採用している。具体的には、重みの程度と入力特徴ノルムを考慮し、より正確に重みの重要度を評価するための新しい標準重要度(SI)指標を導入する。そこで我々は,LLMの異なる層を異なるN:M比で分散させ,圧縮と精度のバランスをとるレイヤワイドアプローチを提案する。さらに,より重要な重みを減らし,スパース領域,中間領域,および密度領域に異なる量子化スキームを適用し,粒度の細かいグループ化戦略を実装した。最後に,構造バイナライゼーションをサポートする専用CUDAカーネルを設計する。我々は,STBLLMの有効性を評価するため,LLaMA-1/2/3,OPTファミリー,Mistralについて広範な実験を行った。その結果,本手法は他の圧縮バイナライズLLM法よりも優れた性能を示し,メモリ要求を著しく低減した。

論文の概要: STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

関連論文リスト