Tail-Aware Post-Training Quantization for 3D Geometry Models
- URL: http://arxiv.org/abs/2602.01741v1
- Date: Mon, 02 Feb 2026 07:21:15 GMT
- Title: Tail-Aware Post-Training Quantization for 3D Geometry Models
- Authors: Sicheng Pan, Chen Tang, Shuzhao Xie, Ke Yang, Weixiang Zhang, Jiawei Li, Bin Chen, Shu-Tao Xia, Zhi Wang,
- Abstract summary: Post-Training Quantization (PTQ) enables efficient inference without retraining.<n>PTQ fails to transfer effectively to 3D models due to intricate feature distributions and prohibitive calibration overhead.<n>We propose TAPTQ, a Tail-Aware Post-Training Quantization pipeline for 3D geometric learning.
- Score: 58.79500829118265
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The burgeoning complexity and scale of 3D geometry models pose significant challenges for deployment on resource-constrained platforms. While Post-Training Quantization (PTQ) enables efficient inference without retraining, conventional methods, primarily optimized for 2D Vision Transformers, fail to transfer effectively to 3D models due to intricate feature distributions and prohibitive calibration overhead. To address these challenges, we propose TAPTQ, a Tail-Aware Post-Training Quantization pipeline specifically engineered for 3D geometric learning. Our contribution is threefold: (1) To overcome the data-scale bottleneck in 3D datasets, we develop a progressive coarse-to-fine calibration construction strategy that constructs a highly compact subset to achieve both statistical purity and geometric representativeness. (2) We reformulate the quantization interval search as an optimization problem and introduce a ternary-search-based solver, reducing the computational complexity from $\mathcal{O}(N)$ to $\mathcal{O}(\log N)$ for accelerated deployment. (3) To mitigate quantization error accumulation, we propose TRE-Guided Module-wise Compensation, which utilizes a Tail Relative Error (TRE) metric to adaptively identify and rectify distortions in modules sensitive to long-tailed activation outliers. Extensive experiments on the VGGT and Pi3 benchmarks demonstrate that TAPTQ consistently outperforms state-of-the-art PTQ methods in accuracy while significantly reducing calibration time. The code will be released soon.
Related papers
- Mixed Precision PointPillars for Efficient 3D Object Detection with TensorRT [0.0]
We propose a mixed precision framework designed for PointPillars.<n>Our methods provides mixed precision models without training in the PTQ pipeline.<n>WithRT deployment, our models offer less latency and sizes by up to 2.35 and 2.26 times, respectively.
arXiv Detail & Related papers (2026-01-19T00:59:13Z) - ITS3D: Inference-Time Scaling for Text-Guided 3D Diffusion Models [88.04431808574581]
ITS3D is a framework that formulates the task as an optimization problem to identify the most effective Gaussian noise input.<n>We introduce three techniques for improved stability, efficiency, and exploration capability.<n>Experiments demonstrate that ITS3D enhances text-to-3D generation quality.
arXiv Detail & Related papers (2025-11-27T13:46:16Z) - Quantized Visual Geometry Grounded Transformer [67.15451442018258]
This paper proposes the first Quantization framework for VGGTs, namely QuantVGGT.<n>We introduce Dual-Smoothed Fine-Grained Quantization, which integrates pre-global Hadamard rotation and post-local channel smoothing.<n>We also design Noise-Filtered Diverse Sampling, which filters outliers via deep-layer statistics.
arXiv Detail & Related papers (2025-09-25T15:17:11Z) - Pseudo Depth Meets Gaussian: A Feed-forward RGB SLAM Baseline [64.42938561167402]
We propose an online 3D reconstruction method using 3D Gaussian-based SLAM, combined with a feed-forward recurrent prediction module.<n>This approach replaces slow test-time optimization with fast network inference, significantly improving tracking speed.<n>Our method achieves performance on par with the state-of-the-art SplaTAM, while reducing tracking time by more than 90%.
arXiv Detail & Related papers (2025-08-06T16:16:58Z) - Spectral Compression Transformer with Line Pose Graph for Monocular 3D Human Pose Estimation [1.8999296421549172]
We introduce the Spectral Compression Transformer (SCT) to reduce sequence length and accelerate computation.<n>The LPG generates skeletal position information that complements the input 2D joint positions.<n>Our model achieves state-of-the-art performance with improved computational efficiency.
arXiv Detail & Related papers (2025-05-27T15:08:03Z) - Equi-GSPR: Equivariant SE(3) Graph Network Model for Sparse Point Cloud Registration [2.814748676983944]
We propose a graph neural network model embedded with a local Spherical Euclidean 3D equivariance property through SE(3) message passing based propagation.
Our model is composed mainly of a descriptor module, equivariant graph layers, match similarity, and the final regression layers.
Experiments conducted on the 3DMatch and KITTI datasets exhibit the compelling and robust performance of our model compared to state-of-the-art approaches.
arXiv Detail & Related papers (2024-10-08T06:48:01Z) - Graph and Skipped Transformer: Exploiting Spatial and Temporal Modeling Capacities for Efficient 3D Human Pose Estimation [36.93661496405653]
We take a global approach to exploit Transformer-temporal information with a concise Graph and Skipped Transformer architecture.
Specifically, in 3D pose stage, coarse-grained body parts are deployed to construct a fully data-driven adaptive model.
Experiments are conducted on Human3.6M, MPI-INF-3DHP and Human-Eva benchmarks.
arXiv Detail & Related papers (2024-07-03T10:42:09Z) - Taming 3DGS: High-Quality Radiance Fields with Limited Resources [50.92437599516609]
3D Gaussian Splatting (3DGS) has transformed novel-view synthesis with its fast, interpretable, and high-fidelity rendering.
We tackle the challenges of training and rendering 3DGS models on a budget.
We derive faster, numerically equivalent solutions for gradient computation and attribute updates.
arXiv Detail & Related papers (2024-06-21T20:44:23Z) - LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object
Detection [35.35457515189062]
Post-Training Quantization (PTQ) has been widely adopted in 2D vision tasks.
LiDAR-PTQ can achieve state-of-the-art quantization performance when applied to CenterPoint.
LiDAR-PTQ is cost-effective being $30times$ faster than the quantization-aware training method.
arXiv Detail & Related papers (2024-01-29T03:35:55Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.