Related papers: Mixed Precision PointPillars for Efficient 3D Object Detection with TensorRT

Mixed Precision PointPillars for Efficient 3D Object Detection with TensorRT

URL: http://arxiv.org/abs/2601.12638v1
Date: Mon, 19 Jan 2026 00:59:13 GMT
Title: Mixed Precision PointPillars for Efficient 3D Object Detection with TensorRT
Authors: Ninnart Fuengfusin, Keisuke Yoneda, Naoki Suganuma,
Abstract summary: We propose a mixed precision framework designed for PointPillars.<n>Our methods provides mixed precision models without training in the PTQ pipeline.<n>WithRT deployment, our models offer less latency and sizes by up to 2.35 and 2.26 times, respectively.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LIDAR 3D object detection is one of the important tasks for autonomous vehicles. Ensuring that this task operates in real-time is crucial. Toward this, model quantization can be used to accelerate the runtime. However, directly applying model quantization often leads to performance degradation due to LIDAR's wide numerical distributions and extreme outliers. To address the wide numerical distribution, we proposed a mixed precision framework designed for PointPillars. Our framework first searches for sensitive layers with post-training quantization (PTQ) by quantizing one layer at a time to 8-bit integer (INT8) and evaluating each model for average precision (AP). The top-k most sensitive layers are assigned as floating point (FP). Combinations of these layers are greedily searched to produce candidate mixed precision models, which are finalized with either PTQ or quantization-aware training (QAT). Furthermore, to handle outliers, we observe that using a very small number of calibration data reduces the likelihood of encountering outliers, thereby improving PTQ performance. Our methods provides mixed precision models without training in the PTQ pipeline, while our QAT pipeline achieves the performance competitive to FP models. With TensorRT deployment, our models offer less latency and sizes by up to 2.35 and 2.26 times, respectively.

Related papers

Tail-Aware Post-Training Quantization for 3D Geometry Models [58.79500829118265]
Post-Training Quantization (PTQ) enables efficient inference without retraining.<n>PTQ fails to transfer effectively to 3D models due to intricate feature distributions and prohibitive calibration overhead.<n>We propose TAPTQ, a Tail-Aware Post-Training Quantization pipeline for 3D geometric learning.
arXiv Detail & Related papers (2026-02-02T07:21:15Z)
PTQAT: A Hybrid Parameter-Efficient Quantization Algorithm for 3D Perception Tasks [9.463776523295303]
Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT) represent two mainstream model quantization approaches.<n>We propose PTQAT, a novel general hybrid quantization algorithm for the efficient deployment of 3D perception networks.
arXiv Detail & Related papers (2025-08-14T11:55:21Z)
Q-PETR: Quant-aware Position Embedding Transformation for Multi-View 3D Object Detection [9.961425621432474]
We propose Q-PETR, a quantization-aware position embedding transformation that re-engineers key components of the PETR framework.<n>Q-PETR maintains floating-point performance with a performance degradation of less than 1% under standard 8-bit per-tensor post-training quantization.<n>Compared to its FP32 counterpart, Q-PETR achieves a two-fold speedup and reduces memory usage by three times.
arXiv Detail & Related papers (2025-02-21T14:26:23Z)
LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection [35.35457515189062]
Post-Training Quantization (PTQ) has been widely adopted in 2D vision tasks. LiDAR-PTQ can achieve state-of-the-art quantization performance when applied to CenterPoint. LiDAR-PTQ is cost-effective being $30times$ faster than the quantization-aware training method.
arXiv Detail & Related papers (2024-01-29T03:35:55Z)
SqueezeLLM: Dense-and-Sparse Quantization [80.32162537942138]
Main bottleneck for generative inference with LLMs is memory bandwidth, rather than compute, for single batch inference. We introduce SqueezeLLM, a post-training quantization framework that enables lossless compression to ultra-low precisions of up to 3-bit. Our framework incorporates two novel ideas: (i) sensitivity-based non-uniform quantization, which searches for the optimal bit precision assignment based on second-order information; and (ii) the Dense-and-Sparse decomposition that stores outliers and sensitive weight values in an efficient sparse format.
arXiv Detail & Related papers (2023-06-13T08:57:54Z)
GMS-3DQA: Projection-based Grid Mini-patch Sampling for 3D Model Quality Assessment [82.93561866101604]
Previous projection-based 3DQA methods directly extract features from multi-projections to ensure quality prediction accuracy. We propose a no-reference (NR) projection-based textitunderlineGrid underlineMini-patch underlineSampling underline3D Model underlineQuality underlineAssessment (GMS-3DQA) method. The proposed GMS-3DQA requires far less computational resources and inference time than other 3D
arXiv Detail & Related papers (2023-06-09T03:53:12Z)
Augmenting Hessians with Inter-Layer Dependencies for Mixed-Precision Post-Training Quantization [7.392278887917975]
We propose a mixed-precision post training quantization approach that assigns different numerical precisions to tensors in a network based on their specific needs. Our experiments demonstrate latency reductions compared to a 16-bit baseline of $25.48%$, $21.69%$, and $33.28%$ respectively.
arXiv Detail & Related papers (2023-06-08T02:18:58Z)
When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable. In order to achieve a better accuracy, we propose two lightweight modules. DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers. QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z)
Quantization-Guided Training for Compact TinyML Models [8.266286436571887]
We propose a Quantization Guided Training (QGT) method to guide DNN training towards optimized low-bit-precision targets. QGT uses customized regularization to encourage weight values towards a distribution that maximizes accuracy while reducing quantization errors.
arXiv Detail & Related papers (2021-03-10T18:06:05Z)
AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation. Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z)
APQ: Joint Search for Network Architecture, Pruning and Quantization Policy [49.3037538647714]
We present APQ for efficient deep learning inference on resource-constrained hardware. Unlike previous methods that separately search the neural architecture, pruning policy, and quantization policy, we optimize them in a joint manner. With the same accuracy, APQ reduces the latency/energy by 2x/1.3x over MobileNetV2+HAQ.
arXiv Detail & Related papers (2020-06-15T16:09:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.