Quantized Visual Geometry Grounded Transformer
- URL: http://arxiv.org/abs/2509.21302v2
- Date: Tue, 30 Sep 2025 03:07:52 GMT
- Title: Quantized Visual Geometry Grounded Transformer
- Authors: Weilun Feng, Haotong Qin, Mingqiang Wu, Chuanguang Yang, Yuqi Li, Xiangqi Li, Zhulin An, Libo Huang, Yulun Zhang, Michele Magno, Yongjun Xu,
- Abstract summary: This paper proposes the first Quantization framework for VGGTs, namely QuantVGGT.<n>We introduce Dual-Smoothed Fine-Grained Quantization, which integrates pre-global Hadamard rotation and post-local channel smoothing.<n>We also design Noise-Filtered Diverse Sampling, which filters outliers via deep-layer statistics.
- Score: 67.15451442018258
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning-based 3D reconstruction models, represented by Visual Geometry Grounded Transformers (VGGTs), have made remarkable progress with the use of large-scale transformers. Their prohibitive computational and memory costs severely hinder real-world deployment. Post-Training Quantization (PTQ) has become a common practice for compressing and accelerating models. However, we empirically observe that PTQ faces unique obstacles when compressing billion-scale VGGTs: the data-independent special tokens induce heavy-tailed activation distributions, while the multi-view nature of 3D data makes calibration sample selection highly unstable. This paper proposes the first Quantization framework for VGGTs, namely QuantVGGT. This mainly relies on two technical contributions: First, we introduce Dual-Smoothed Fine-Grained Quantization, which integrates pre-global Hadamard rotation and post-local channel smoothing to mitigate heavy-tailed distributions and inter-channel variance robustly. Second, we design Noise-Filtered Diverse Sampling, which filters outliers via deep-layer statistics and constructs frame-aware diverse calibration clusters to ensure stable quantization ranges. Comprehensive experiments demonstrate that QuantVGGT achieves the state-of-the-art results across different benchmarks and bit-width, surpassing the previous state-of-the-art generic quantization method with a great margin. We highlight that our 4-bit QuantVGGT can deliver a 3.7$\times$ memory reduction and 2.5$\times$ acceleration in real-hardware inference, while maintaining reconstruction accuracy above 98\% of its full-precision counterpart. This demonstrates the vast advantages and practicality of QuantVGGT in resource-constrained scenarios. Our code is released in https://github.com/wlfeng0509/QuantVGGT.
Related papers
- BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models [56.504879072674015]
We propose Bit-Plane Decomposition Quantization (BPDQ), which constructs a variable quantization grid via bit-planes and scalar coefficients.<n>BPDQ enables serving Qwen2.5-72B on a single GTX 3090 with 83.85% GSM8K accuracy (vs. 90.83% at 16-bit)
arXiv Detail & Related papers (2026-02-04T02:54:37Z) - Tail-Aware Post-Training Quantization for 3D Geometry Models [58.79500829118265]
Post-Training Quantization (PTQ) enables efficient inference without retraining.<n>PTQ fails to transfer effectively to 3D models due to intricate feature distributions and prohibitive calibration overhead.<n>We propose TAPTQ, a Tail-Aware Post-Training Quantization pipeline for 3D geometric learning.
arXiv Detail & Related papers (2026-02-02T07:21:15Z) - PTQ4ARVG: Post-Training Quantization for AutoRegressive Visual Generation Models [22.35209793690791]
Quantization is commonly employed in neural networks to reduce model size and computational latency.<n>Applying quantization to ARVG remains largely underexplored.<n>We propose PTQ4ARVG, a training-free post-training quantization framework.
arXiv Detail & Related papers (2026-01-29T04:00:00Z) - LRQ-DiT: Log-Rotation Post-Training Quantization of Diffusion Transformers for Image and Video Generation [41.66473889057111]
Diffusion Transformers (DiTs) have achieved impressive performance in text-to-image and text-to-video generation.<n>DiTs' high computational cost and large parameter sizes pose significant challenges for usage in resource-constrained scenarios.<n>We propose LRQ-DiT, an efficient and accurate post-training quantization framework for image and video generation.
arXiv Detail & Related papers (2025-08-05T14:16:11Z) - OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting [20.944120156871108]
Post-training quantization (PTQ) has emerged as a widely adopted technique for compressing and accelerating Large Language Models (LLMs)<n>The major challenge in LLM quantization is that uneven and heavy-tailed data distributions can expand the quantization range, thereby reducing bit precision for most values.<n>We introduce Quantization Space Utilization Rate (BrotherQSUR), a novel metric that effectively assesses the quantizability of transformed data by measuring the space utilization of the data in the quantization space.
arXiv Detail & Related papers (2025-01-23T08:24:25Z) - CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes [53.107474952492396]
CityGaussianV2 is a novel approach for large-scale scene reconstruction.<n>We implement a decomposed-gradient-based densification and depth regression technique to eliminate blurry artifacts and accelerate convergence.<n>Our method strikes a promising balance between visual quality, geometric accuracy, as well as storage and training costs.
arXiv Detail & Related papers (2024-11-01T17:59:31Z) - FlatQuant: Flatness Matters for LLM Quantization [58.28221892035609]
We propose FlatQuant, a new post-training quantization approach that enhances the flatness of weights and activations.<n>Our approach identifies optimal affine transformations for each linear layer, calibrated in hours via a lightweight objective.<n>It achieves less than 1% accuracy drop for W4A4 quantization on the LLaMA-3-70B model, surpassing SpinQuant by 7.5%.
arXiv Detail & Related papers (2024-10-12T08:10:28Z) - SqueezeLLM: Dense-and-Sparse Quantization [80.32162537942138]
Main bottleneck for generative inference with LLMs is memory bandwidth, rather than compute, for single batch inference.
We introduce SqueezeLLM, a post-training quantization framework that enables lossless compression to ultra-low precisions of up to 3-bit.
Our framework incorporates two novel ideas: (i) sensitivity-based non-uniform quantization, which searches for the optimal bit precision assignment based on second-order information; and (ii) the Dense-and-Sparse decomposition that stores outliers and sensitive weight values in an efficient sparse format.
arXiv Detail & Related papers (2023-06-13T08:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.