Fugu-MT 論文翻訳(概要): ENEC: A Lossless AI Model Compression Method Enabling Fast Inference on Ascend NPUs

論文の概要: ENEC: A Lossless AI Model Compression Method Enabling Fast Inference on Ascend NPUs

arxiv url: http://arxiv.org/abs/2604.03298v1
Date: Sat, 28 Mar 2026 16:11:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-07 15:49:18.481634
Title: ENEC: A Lossless AI Model Compression Method Enabling Fast Inference on Ascend NPUs
Title（参考訳）: ENEC:Ascend NPU上での高速推論を実現するロスレスAIモデル圧縮手法
Authors: Jinwu Yang, Jiaan Wu, Zedong Liu, Xinyang Ma, Hairui Zhao, Yida Gu, Yuanhong Huang, Xingchen Liu, Wenjing Huang, Zheng Wei, Jing Xing, Yili Ma, Qingyi Zhang, Baoyi An, Zhongzhe Hu, Shaoteng Liu, Xia Zhu, Jiaxun Lu, Guangming Tan, Dingwen Tao,
Abstract要約: ENECはAIモデルの重み用に特別にカスタマイズされ、Ascend Neural Processing Units向けに最適化された新しい圧縮方法である。主要なGPUソリューションと比較すると、ENECはDietGPUより3.43倍、圧縮比はnvCOMPより1.12倍高い。 ENECはエンドツーエンドの推論性能を大幅に改善し、最大6.3倍のスピードアップを実現した。
参考スコア（独自算出の注目度）: 13.980477697764014
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rapid scaling of Large Language Models presents significant challenges for their deployment and inference, particularly on resource-constrained specialized AI hardware accelerators such as Huawei's Ascend NPUs, where weight data transfer has become a critical performance bottleneck. While lossless compression can preserve model accuracy and reduce data volume, existing lossless compression algorithms exhibit extremely low throughput when ported to the Ascend NPU architecture. In this paper, we propose ENEC, a novel lossless compression method specifically customized for AI model weights and optimized for Ascend Neural Processing Units. ENEC adopts a block-based fixed-length encoding scheme and incorporates a series of NPU-specific optimizations: bit-width quantization with hierarchical halving bit-packing, vectorized branch-free integer transformation, and dependency-decoupled intra-segment scan for efficient prefix-sum computation. Experimental results demonstrate that ENEC outperforms existing state-of-the-art NPU compressors in both compression ratio and throughput. Compared to leading GPU solutions, ENEC achieves a 3.43X higher throughput than DietGPU and a 1.12X better compression ratio than nvCOMP. By reducing weight transmission overhead, ENEC significantly improves end-to-end inference performance, achieving up to a 6.3X speedup. On Ascend NPUs, ENEC is the first open-source lossless compression algorithm for model weights that achieves performance comparable to state-of-the-art GPU compressors, offering an effective solution for deploying large-scale AI models.
Abstract（参考訳）: 大規模言語モデルの迅速なスケーリングは、特にHuaweiのAscend NPUのようなリソース制約の厳しいAIハードウェアアクセラレータにおいて、そのデプロイメントと推論に重大な課題をもたらしている。ロスレス圧縮はモデル精度を保ち、データ量を削減することができるが、既存のロスレス圧縮アルゴリズムはAscend NPUアーキテクチャに移植すると極めて低いスループットを示す。本稿では,AIモデルの重みに特化して最適化され,Ascend Neural Processing Unitsに最適化された新しいロスレス圧縮手法ENECを提案する。 ENECはブロックベースの固定長符号化方式を採用し、ビット幅量子化(英語版)、ベクトル化された分岐なし整数変換(英語版)、依存関係分離したセグメント内スキャン(英語版)といったNPU固有の最適化を取り入れ、効率的なプレフィックスサム計算を行う。実験により、ENECは圧縮比とスループットの両方で既存の最先端のNPU圧縮機より優れていることが示された。主要なGPUソリューションと比較すると、ENECはDietGPUより3.43倍、圧縮比はnvCOMPより1.12倍高い。重量伝達オーバーヘッドを低減することにより、ENECは最大6.3倍のスピードアップを達成し、エンドツーエンドの推論性能を大幅に改善する。 Ascend NPUでは、ENECは、最先端のGPU圧縮機に匹敵するパフォーマンスを達成する、モデルウェイトに対する初めてのオープンソースのロスレス圧縮アルゴリズムであり、大規模なAIモデルをデプロイするための効果的なソリューションを提供する。

論文の概要: ENEC: A Lossless AI Model Compression Method Enabling Fast Inference on Ascend NPUs

関連論文リスト