Fugu-MT 論文翻訳(概要): ActQuant: Sub-4-bit Action-Guided Quantization for Vision-Language-Action Models

論文の概要: ActQuant: Sub-4-bit Action-Guided Quantization for Vision-Language-Action Models

arxiv url: http://arxiv.org/abs/2605.24011v1
Date: Tue, 19 May 2026 19:57:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:17.531233
Title: ActQuant: Sub-4-bit Action-Guided Quantization for Vision-Language-Action Models
Title（参考訳）: ActQuant: Sub-4-bit Action-Guided Quantization for Vision-Language-Action Models
Authors: Arash Akbari, Arman Akbari, Masih Eskandar, Qitao Tan, Yixiao Chen, Jingwu Luo, Bertha Pangaribuan, Liyun Zhang, Jennifer Dy, Geng Yuan, Xue Lin, Gaowen Liu, Stratis Ioannidis, Yanzhi Wang,
Abstract要約: 本稿では,アクション誘導型混合精度PTQフレームワークであるActQuantを紹介する。 ActQuant は、(1) エージェントの動作の予測にどの程度貢献するかに基づいて、各重み行列に1ビット幅を割り当てるテンソル間ビットアロケータである。我々はまた、効率的な低ビットカーネルを持つネイティブC/C++にポートするエージェント変換パイプラインであるOmniModelも導入した。
参考スコア（独自算出の注目度）: 45.62029693245481
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Vision-Language-Action (VLA) models exhibit remarkable action generation for embodied intelligence, but their heavy compute make deployment on edge platforms impractical. Aggressive, sub-4-bit weight quantization is the natural solution, yet existing post-training quantization (PTQ) methods suffer severe performance degradation in this regime. To address this, we introduce ActQuant, an action-guided mixed-precision PTQ framework that operates in two stages: (1) an inter-tensor bit allocator that assigns each weight matrix a single bit-width based on how much it contributes to predicting the agent's actions; (2) an intra-tensor scale optimizer tunes per-block quantization scales using action-aware curvature, so that dynamic range is concentrated on the weights most influential for control. To deliver the on-device benefits of our aggressive quantization, we further introduce OmniModel.cpp, an agentic conversion pipeline that ports architectures into a native C/C++ runtime with efficient low-bit kernels. We evaluate ActQuant both in simulation and on a real-world 6-DoF UR3 arm, with all models deployed through OmniModel.cpp. On the LIBERO benchmark, ActQuant is the only method that operates at or below 3 bits-per-weight, retaining 95.0% on OpenVLA-OFT and 94.8% on $π_{0.5}$. Pushed further, ActQuant reaches 2.5 bpw at 90.1% on OpenVLA-OFT, compressing the backbone from 14.3 GB to 2.7 GB (5.3$\times$). On the physical UR3 arm, $π_{0.5}$ quantized with ActQuant retains the baseline's success rate while reducing the memory footprint by 2.5$\times$.
Abstract（参考訳）: VLA(Vision-Language-Action)モデルは、具体的インテリジェンスに対する顕著なアクション生成を示すが、その重い計算はエッジプラットフォームへのデプロイを非現実的にする。攻撃的で、サブ4ビットの量子化は自然解であるが、既存のトレーニング後の量子化(PTQ)法は、この状態において深刻な性能劣化を被る。そこで我々は,アクション誘導型混合精度PTQフレームワークであるActQuantを紹介した。(1) エージェントの動作予測にどの程度貢献するかに基づいて,各重み行列に1ビット幅を割り当てるインターテンソルビットアロケータ,(2) アクション認識曲率を用いてブロック単位の量子化スケールをチューニングすることで,動的範囲を制御に最も影響力のある重みに集中させる。 OmniModel.cppは、アーキテクチャをネイティブC/C++ランタイムに移植し、効率的なロービットカーネルを提供するエージェント変換パイプラインです。我々は,OmniModel.cppを通じて,ActQuantを実世界の6-DoF UR3アームで評価した。 LIBEROベンチマークでは、ActQuantは3ビット以下で動作し、OpenVLA-OFTでは95.0%、$π_{0.5}$では94.8%である。さらに、ActQuantはOpenVLA-OFTで2.5bpwに達し、バックボーンを14.3GBから2.7GBに圧縮する(5.3$\times$)。物理的なUR3アームでは、ActQuantで量子化された$π_{0.5}$がベースラインの成功率を保持し、メモリフットプリントを2.5$\times$に下げる。

論文の概要: ActQuant: Sub-4-bit Action-Guided Quantization for Vision-Language-Action Models

関連論文リスト