Fugu-MT 論文翻訳(概要): DBellQuant: Breaking the Bell with Double-Bell Transformation for LLMs Post Training Binarization

論文の概要: DBellQuant: Breaking the Bell with Double-Bell Transformation for LLMs Post Training Binarization

arxiv url: http://arxiv.org/abs/2507.01027v1
Date: Wed, 18 Jun 2025 06:41:03 GMT
ステータス: 翻訳完了
システム内更新日: 2025-07-07 02:47:44.420398
Title: DBellQuant: Breaking the Bell with Double-Bell Transformation for LLMs Post Training Binarization
Title（参考訳）: DBellQuant: LLM後2値化のためのダブルベル変換によるベル破折術
Authors: Zijian Ye, Wei Huang, Yifei Yu, Tianhe Ren, Zhongrui Wang, Xiaojuan Qi,
Abstract要約: DBellQuantは、大規模言語モデルのトレーニング後の量子化フレームワークである。ほぼ1ビットの重み圧縮と6ビットのアクティベーション量子化を実現し、性能劣化を最小限に抑える。攻撃的な重み付けとアクティベーション量子化の下で優れたモデル性能を維持することにより、新しい最先端技術を設定する。
参考スコア（独自算出の注目度）: 38.333517224831624
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) demonstrate remarkable performance but face substantial computational and memory challenges that limit their practical deployment. Quantization has emerged as a promising solution; however, its effectiveness is often limited by quantization errors arising from weight distributions that are not quantization-friendly and the presence of activation outliers. To address these challenges, we introduce DBellQuant, an innovative post-training quantization (PTQ) framework that achieves nearly 1-bit weight compression and 6-bit activation quantization with minimal performance degradation. DBellQuant uses Learnable Transformation for Dual-Bell (LTDB) algorithm, which transforms single-bell weight distributions into dual-bell forms to reduce binarization errors and applies inverse transformations to smooth activations. DBellQuant sets a new state-of-the-art by preserving superior model performance under aggressive weight and activation quantization. For example, on the Wikitext2 dataset, DBellQuant achieves a perplexity of 14.39 on LLaMA2-13B with 6-bit activation quantization, significantly outperforming BiLLM's 21.35 without activation quantization, underscoring its potential in compressing LLMs for real-world applications.
Abstract（参考訳）: 大規模言語モデル(LLM)は目覚ましい性能を示すが、現実的な展開を制限するような計算とメモリの問題に直面している。量子化は有望な解として現れてきたが、その有効性は量子化にフレンドリでない重量分布とアクティベーションアウトリーの存在から生じる量子化誤差によって制限されることが多い。これらの課題に対処するために,約1ビットの重み圧縮と6ビットのアクティベーション量子化を実現し,性能劣化を最小限に抑えた,革新的なポストトレーニング量子化(PTQ)フレームワークであるDBellQuantを紹介する。 DBellQuantはLearnerable Transformation for Dual-Bell (LTDB)アルゴリズムを用いており、これはシングルベルの重み分布を2ベル形式に変換し、二値化誤差を低減し、スムーズなアクティベーションに逆変換を適用する。 DBellQuantは、アグレッシブウェイトおよびアクティベーション量子化の下で優れたモデル性能を維持することによって、新しい最先端技術を設定する。例えば、Wikitext2データセットでは、DBellQuantは6ビットのアクティベーション量子化を持つLLaMA2-13B上で14.39のパープレキシティを実現し、BiLLMの21.35のアクティベーション量子化をはるかに上回っており、現実のアプリケーション向けにLLMを圧縮する可能性を示している。

論文の概要: DBellQuant: Breaking the Bell with Double-Bell Transformation for LLMs Post Training Binarization

関連論文リスト