Related papers: HBVLA: Pushing 1-Bit Post-Training Quantization for Vision-Language-Action Models

HBVLA: Pushing 1-Bit Post-Training Quantization for Vision-Language-Action Models

URL: http://arxiv.org/abs/2602.13710v1
Date: Sat, 14 Feb 2026 10:23:45 GMT
Title: HBVLA: Pushing 1-Bit Post-Training Quantization for Vision-Language-Action Models
Authors: Xin Yan, Zhenglin Wan, Feiyang Ye, Xingrui Yu, Hangyu Du, Yang You, Ivor Tsang,
Abstract summary: Vision-Language-Action (VLA) models enable instruction-following embodied control.<n>Current methods fail to narrow the distribution gap between binarized and full-precision weights.<n>We propose HBVLA, a VLA-tailored binarization framework.
Score: 11.913553037277472
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-Language-Action (VLA) models enable instruction-following embodied control, but their large compute and memory footprints hinder deployment on resource-constrained robots and edge platforms. While reducing weights to 1-bit precision through binarization can greatly improve efficiency, existing methods fail to narrow the distribution gap between binarized and full-precision weights, causing quantization errors to accumulate under long-horizon closed-loop execution and severely degrade actions. To fill this gap, we propose HBVLA, a VLA-tailored binarization framework. First, we use a policy-aware enhanced Hessian to identify weights that are truly critical for action generation. Then, we employ a sparse orthogonal transform for non-salient weights to induce a low-entropy intermediate state. Finally, we quantize both salient and non-salient weights in the Harr domain with group-wise 1-bit quantization. We have evaluated our approach on different VLAs: on LIBERO, quantized OpenVLA-OFT retains 92.2% of full-precision performance; on SimplerEnv, quantized CogAct retains 93.6%, significantly outperforming state-of-the-art binarization methods. We further validate our method on real-world evaluation suite and the results show that HBVLA incurs only marginal success-rate degradation compared to the full-precision model, demonstrating robust deployability under tight hardware constraints. Our work provides a practical foundation for ultra-low-bit quantization of VLAs, enabling more reliable deployment on hardware-limited robotic platforms.

Related papers

QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models [21.01470580488428]
Vision-language-action (VLA) models unify perception, language, and control for embodied agents.<n>We introduce QuantVLA, a training-free post-training quantization framework.<n>It is the first PTQ approach for VLA systems and the first to successfully quantize a diffusion transformer (DiT) action head.
arXiv Detail & Related papers (2026-02-23T19:55:54Z)
QVLA: Not All Channels Are Equal in Vision-Language-Action Model's Quantization [29.21308068128823]
We introduce QVLA, the first action-centric quantization framework specifically designed for embodied control.<n>Our work establishes a new, principled foundation for compressing Vision-Language-Action models in robotics.
arXiv Detail & Related papers (2026-02-03T17:43:45Z)
Sliced-Wasserstein Distribution Alignment Loss Improves the Ultra-Low-Bit Quantization of Large Language Models [0.5964436882344729]
We introduce a sliced Wasserstein loss function for distribution-aware calibration in ultra-low-bit post-training quantization.<n>The proposed loss aligns the output distributions of full-precision and quantized models under random linear projections.<n>We demonstrate the performance gains of our proposed model by incorporating it with two frontier methods known as OmniQuant and TesseraQ.
arXiv Detail & Related papers (2026-01-11T15:14:05Z)
Rethinking Output Alignment For 1-bit Post-Training Quantization of Large Language Models [41.677469535447024]
Large Language Models (LLMs) deliver strong performance across a wide range of NLP tasks, but their massive sizes hinder deployment on resource-constrained devices.<n>Post-training quantization (PTQ) is widely adopted for its efficiency, as it requires no retraining and only a small dataset for calibration.<n>Recent advances for post-training quantization have demonstrated that even sub-4-bit methods can maintain most of the original model performance.
arXiv Detail & Related papers (2025-12-25T12:39:36Z)
Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach [78.4812458793128]
We propose textbfTACO, a test-time-scaling framework that applies a lightweight pseudo-count estimator as a high-fidelity verifier of action chunks.<n>Our method resembles the classical anti-exploration principle in offline reinforcement learning (RL), and being gradient-free, it incurs significant computational benefits.
arXiv Detail & Related papers (2025-12-02T14:42:54Z)
MPQ-DMv2: Flexible Residual Mixed Precision Quantization for Low-Bit Diffusion Models with Temporal Distillation [74.34220141721231]
We present MPQ-DMv2, an improved textbfMixed textbfPrecision textbfQuantization framework for extremely low-bit textbfDiffusion textbfModels.
arXiv Detail & Related papers (2025-07-06T08:16:50Z)
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics [64.62231094774211]
Statefuls (e.g., Adam) maintain auxiliary information even 2x the model size in order to achieve optimal convergence.<n>SOLO enables Adam-styles to maintain quantized states with precision as low as 3 bits, or even 2 bits.<n>SOLO can thus be seamlessly applied to Adam-styles, leading to substantial memory savings with minimal accuracy loss.
arXiv Detail & Related papers (2025-05-01T06:47:45Z)
RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models [53.571195477043496]
We propose an algorithm named Rotated Straight-Through-Estimator (RoSTE)<n>RoSTE combines quantization-aware supervised fine-tuning (QA-SFT) with an adaptive rotation strategy to reduce activation outliers.<n>Our findings reveal that the prediction error is directly proportional to the quantization error of the converged weights, which can be effectively managed through an optimized rotation configuration.
arXiv Detail & Related papers (2025-02-13T06:44:33Z)
DB-LLM: Accurate Dual-Binarization for Efficient LLMs [83.70686728471547]
Large language models (LLMs) have significantly advanced the field of natural language processing. Existing ultra-low-bit quantization always causes severe accuracy drops. We propose a novel Dual-Binarization method for LLMs, namely DB-LLM.
arXiv Detail & Related papers (2024-02-19T09:04:30Z)
QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning [52.157939524815866]
In this paper, we identify imbalanced activation distributions as a primary source of quantization difficulty.<n>We propose to adjust these distributions through weight finetuning to be more quantization-friendly.<n>Our method demonstrates its efficacy across three high-resolution image generation tasks.
arXiv Detail & Related papers (2024-02-06T03:39:44Z)
Norm Tweaking: High-performance Low-bit Quantization of Large Language Models [21.855106896725598]
We introduce a technique called norm tweaking, which can be used as a plugin in current PTQ methods to achieve high precision. Our method demonstrates significant improvements in both weight-only quantization and joint quantization of weights and activations. Our simple and effective approach makes it more practical for real-world applications.
arXiv Detail & Related papers (2023-09-06T06:51:15Z)
TernaryBERT: Distillation-aware Ultra-low Bit BERT [53.06741585060951]
We propose TernaryBERT, which ternarizes the weights in a fine-tuned BERT model. Experiments on the GLUE benchmark and SQuAD show that our proposed TernaryBERT outperforms the other BERT quantization methods.
arXiv Detail & Related papers (2020-09-27T10:17:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.