Related papers: An Empirical Study of World Model Quantization

An Empirical Study of World Model Quantization

URL: http://arxiv.org/abs/2602.02110v1
Date: Mon, 02 Feb 2026 13:54:03 GMT
Title: An Empirical Study of World Model Quantization
Authors: Zhongqian Fu, Tianyi Zhao, Kai Han, Hang Zhou, Xinghao Chen, Yunhe Wang,
Abstract summary: We present a systematic empirical study of world model quantization using DINO-WM.<n>We conduct experiments on different visual planning tasks across a wide range of bit-widths, quantization granularities, and planning horizons up to 50 iterations.<n>Results show that quantization effects in world models extend beyond standard accuracy and bit-width trade-offs.
Score: 34.94388089174202
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: World models learn an internal representation of environment dynamics, enabling agents to simulate and reason about future states within a compact latent space for tasks such as planning, prediction, and inference. However, running world models rely on hevay computational cost and memory footprint, making model quantization essential for efficient deployment. To date, the effects of post-training quantization (PTQ) on world models remain largely unexamined. In this work, we present a systematic empirical study of world model quantization using DINO-WM as a representative case, evaluating diverse PTQ methods under both weight-only and joint weight-activation settings. We conduct extensive experiments on different visual planning tasks across a wide range of bit-widths, quantization granularities, and planning horizons up to 50 iterations. Our results show that quantization effects in world models extend beyond standard accuracy and bit-width trade-offs: group-wise weight quantization can stabilize low-bit rollouts, activation quantization granularity yields inconsistent benefits, and quantization sensitivity is highly asymmetric between encoder and predictor modules. Moreover, aggressive low-bit quantization significantly degrades the alignment between the planning objective and task success, leading to failures that cannot be remedied by additional optimization. These findings reveal distinct quantization-induced failure modes in world model-based planning and provide practical guidance for deploying quantized world models under strict computational constraints. The code will be available at https://github.com/huawei-noah/noah-research/tree/master/QuantWM.

Related papers

Scaling Laws for Precision in High-Dimensional Linear Regression [38.87908801454087]
We study scaling laws for low-precision training within a high-dimensional sketched linear regression framework.<n>By analyzing multiplicative and additive quantization, we identify a critical dichotomy in their scaling behaviors.<n>Our work provides a theoretical basis for optimizing training protocols under practical hardware constraints.
arXiv Detail & Related papers (2026-02-22T15:51:29Z)
Quantization-Aware Collaborative Inference for Large Embodied AI Models [67.66340659245186]
Large artificial intelligence models (LAIMs) are increasingly regarded as a core intelligence engine for embodied AI applications.<n>To address this issue, we investigate quantization-aware collaborative inference (co-inference) for embodied AI systems.
arXiv Detail & Related papers (2026-02-13T16:08:19Z)
LSGQuant: Layer-Sensitivity Guided Quantization for One-Step Diffusion Real-World Video Super-Resolution [52.627063566555194]
We introduce LSGQuant, a layer-sensitivity guided quantizing approach for one-step diffusion-based real-world VSR.<n>Our method incorporates a Dynamic Range Adaptive Quantizer (DRAQ) to fit video token activations.<n>Our method has nearly performance to origin model with full-precision and significantly exceeds existing quantization techniques.
arXiv Detail & Related papers (2026-02-03T06:53:19Z)
MoQE: Improve Quantization Model performance via Mixture of Quantization Experts [5.990018519616728]
Mixture of Quantization Experts( abbr. MoQE) is a quantization inference framework based on the Mixture-of-Experts architecture.<n>MoQE combines multiple quantization variants of one full-precision model as specialized "quantization experts"<n>We show that MoQE achieves performance comparable to SOTA quantization model, without incurring significant increases in inference latency.
arXiv Detail & Related papers (2025-08-09T05:58:29Z)
Merge-Friendly Post-Training Quantization for Multi-Target Domain Adaptation [7.193483612237862]
In this study, we analyze the impact of quantization on model merging through the lens of error barriers.<n>We propose a novel post-training quantization, HDRQ - Hessian and distant regularizing quantization, that is designed to consider model merging for multi-target domain adaptation.<n>Our approach ensures that the quantization process incurs minimal deviation from the source pre-trained model while flattening the loss surface to facilitate smooth model merging.
arXiv Detail & Related papers (2025-05-29T17:00:56Z)
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models [48.98109982725689]
We conduct the first systematic study on quantized reasoning models.<n>Our investigation covers weight, KV cache, and activation quantization using state-of-the-art algorithms at varying bit-widths.<n>We identify model size, model origin, and task difficulty as critical determinants of performance.
arXiv Detail & Related papers (2025-04-07T08:22:45Z)
RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models [53.571195477043496]
We propose an algorithm named Rotated Straight-Through-Estimator (RoSTE)<n>RoSTE combines quantization-aware supervised fine-tuning (QA-SFT) with an adaptive rotation strategy to reduce activation outliers.<n>Our findings reveal that the prediction error is directly proportional to the quantization error of the converged weights, which can be effectively managed through an optimized rotation configuration.
arXiv Detail & Related papers (2025-02-13T06:44:33Z)
A Data-Free Analytical Quantization Scheme for Deep Learning Models [1.815974770854455]
We introduce a novel post-training quantization method for model weights.<n>Our method finds optimal clipping thresholds and scaling factors along with mathematical guarantees that our method minimizes quantization noise.<n> Empirical results on real-world datasets demonstrate that our quantization scheme significantly reduces model size and computational requirements while preserving model accuracy.
arXiv Detail & Related papers (2024-12-10T10:33:58Z)
QT-DoG: Quantization-aware Training for Domain Generalization [58.439816306817306]
We propose Quantization-aware Training for Domain Generalization (QT-DoG)<n>We demonstrate that weight quantization effectively leads to flatter minima in the loss landscape.<n> QT-DoG exploits quantization as an implicit regularizer by inducing noise in model weights.
arXiv Detail & Related papers (2024-10-08T13:21:48Z)
Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners [51.32182730502002]
We introduce Singular-value Diagonal Expansion to refine weight distributions to achieve better quantization alignment.<n>Our plug-and-play weight-quantization methods demonstrate substantial performance improvements over state-of-the-art approaches.
arXiv Detail & Related papers (2024-07-22T09:45:16Z)
PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models [52.09865918265002]
We propose a novel quantize before fine-tuning'' framework, PreQuant. PreQuant is compatible with various quantization strategies, with outlier-aware fine-tuning incorporated to correct the induced quantization error. We demonstrate the effectiveness of PreQuant on the GLUE benchmark using BERT, RoBERTa, and T5.
arXiv Detail & Related papers (2023-05-30T08:41:33Z)
Zero-shot Adversarial Quantization [11.722728148523366]
We propose a zero-shot adversarial quantization (ZAQ) framework, facilitating effective discrepancy estimation and knowledge transfer. This is achieved by a novel two-level discrepancy modeling to drive a generator to synthesize informative and diverse data examples. We conduct extensive experiments on three fundamental vision tasks, demonstrating the superiority of ZAQ over the strong zero-shot baselines.
arXiv Detail & Related papers (2021-03-29T01:33:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.