QuantV2X: A Fully Quantized Multi-Agent System for Cooperative Perception
- URL: http://arxiv.org/abs/2509.03704v1
- Date: Wed, 03 Sep 2025 20:39:03 GMT
- Title: QuantV2X: A Fully Quantized Multi-Agent System for Cooperative Perception
- Authors: Seth Z. Zhao, Huizhi Zhang, Zhaowei Li, Juntong Peng, Anthony Chui, Zewei Zhou, Zonglin Meng, Hao Xiang, Zhiyu Huang, Fujia Wang, Ran Tian, Chenfeng Xu, Bolei Zhou, Jiaqi Ma,
- Abstract summary: We introduce textbfQuantV2X, the first fully quantized multi-agent system for efficient deployment of cooperative perception.<n>Despite operating under low-bit constraints, QuantV2X achieves accuracy comparable to full-precision systems.<n>Results highlight the viability of a fully quantized multi-agent intermediate fusion system for real-world deployment.
- Score: 47.35478308553379
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Cooperative perception through Vehicle-to-Everything (V2X) communication offers significant potential for enhancing vehicle perception by mitigating occlusions and expanding the field of view. However, past research has predominantly focused on improving accuracy metrics without addressing the crucial system-level considerations of efficiency, latency, and real-world deployability. Noticeably, most existing systems rely on full-precision models, which incur high computational and transmission costs, making them impractical for real-time operation in resource-constrained environments. In this paper, we introduce \textbf{QuantV2X}, the first fully quantized multi-agent system designed specifically for efficient and scalable deployment of multi-modal, multi-agent V2X cooperative perception. QuantV2X introduces a unified end-to-end quantization strategy across both neural network models and transmitted message representations that simultaneously reduces computational load and transmission bandwidth. Remarkably, despite operating under low-bit constraints, QuantV2X achieves accuracy comparable to full-precision systems. More importantly, when evaluated under deployment-oriented metrics, QuantV2X reduces system-level latency by 3.2$\times$ and achieves a +9.5 improvement in mAP30 over full-precision baselines. Furthermore, QuantV2X scales more effectively, enabling larger and more capable models to fit within strict memory budgets. These results highlight the viability of a fully quantized multi-agent intermediate fusion system for real-world deployment. The system will be publicly released to promote research in this field: https://github.com/ucla-mobility/QuantV2X.
Related papers
- HyPCA-Net: Advancing Multimodal Fusion in Medical Image Analysis [0.0]
We propose a Hybrid Parallel-Fusion Cascaded Attention Network (HyPCA-Net)<n>HyPCA-Net is composed of two core novel blocks: (a) a computationally efficient residual adaptive learning attention block for capturing modality-specific representations, and (b) a dual-view cascaded attention block aimed at learning robust shared representations across diverse modalities.<n>Experiments show that HyPCA-Net significantly outperforms existing leading methods, with improvements of up to 5.2% in performance and reductions of up to 73.1% in computational cost.
arXiv Detail & Related papers (2026-02-18T07:47:49Z) - Dual Latent Memory for Visual Multi-agent System [69.29799381195592]
Visual Multi-Agent Systems (VMAS) promise to enhance comprehensive abilities through inter-agent collaboration.<n>Increasing agent turns often degrades performance while exponentially inflating token costs.<n>We propose L$2$-VMAS, a novel model-agnostic framework that enables inter-agent collaboration with dual latent memories.
arXiv Detail & Related papers (2026-01-31T02:49:10Z) - End-to-End 3D Spatiotemporal Perception with Multimodal Fusion and V2X Collaboration [7.235781104512231]
X-2V2X is a multi-modal fused end-to-end framework for v2x collaboration.<n>It unifies multi-view multimodal sensing within shared representation.<n>X-V2X achieves robust and temporally stable perception in complex traffic scenarios.
arXiv Detail & Related papers (2025-12-26T02:20:22Z) - Joint Channel Estimation and Computation Offloading in Fluid Antenna-assisted MEC Networks [81.36647816787713]
We propose an FA-assisted offloading framework to minimize the delay of channel estimation.<n>We show that the proposed system significantly reduces the accuracy under efficient communication.
arXiv Detail & Related papers (2025-09-16T08:48:44Z) - A Lightweight Group Multiscale Bidirectional Interactive Network for Real-Time Steel Surface Defect Detection [15.140649886958945]
Group Multiscale Bidirectional Interactive (GMBI) modules enhance multiscale feature extraction and interaction.<n>Experiments on SD-Saliency-900 and NRSD-MN datasets demonstrate that GMBINet delivers competitive accuracy with real-time speeds of 1048 FPS on GPU and 16.53 FPS on CPU at 512 resolution.
arXiv Detail & Related papers (2025-08-22T13:58:35Z) - Efficient Edge LLMs Deployment via HessianAware Quantization and CPU GPU Collaborative [31.74122603714625]
Mixture of Experts (MoE) architecture enhances model capacity through sparse activation.<n>MoE faces two major difficulties in practical deployment.<n>Under limited memory, efficient offloading and collaborative inference of expert modules struggle to balance latency and throughput.<n>This paper proposes an efficient MoE edge deployment scheme based on Hessian-Aware Quantization (HAQ) and CPU- GPU collaborative inference.
arXiv Detail & Related papers (2025-08-10T12:59:57Z) - MPQ-DMv2: Flexible Residual Mixed Precision Quantization for Low-Bit Diffusion Models with Temporal Distillation [74.34220141721231]
We present MPQ-DMv2, an improved textbfMixed textbfPrecision textbfQuantization framework for extremely low-bit textbfDiffusion textbfModels.
arXiv Detail & Related papers (2025-07-06T08:16:50Z) - EQuARX: Efficient Quantized AllReduce in XLA for Distributed Machine Learning Acceleration [3.757632817011334]
We present a native dynamic block-wise efficient quantized AllReduce within the XLA compiler for TPUs (EQuARX)<n>By using TPU-friendly quantization and deep pipelining of communication and compute, EQuARX with int8 precision achieves a 1.8X speedup over baseline BF16 AllReduce.
arXiv Detail & Related papers (2025-06-21T06:54:52Z) - On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks [52.97107229149988]
We propose an On-Chip Hardware-Aware Quantization framework, performing hardware-aware mixed-precision quantization on deployed edge devices.
For efficiency metrics, we built an On-Chip Quantization Aware pipeline, which allows the quantization process to perceive the actual hardware efficiency of the quantization operator.
For accuracy metrics, we propose Mask-Guided Quantization Estimation technology to effectively estimate the accuracy impact of operators in the on-chip scenario.
arXiv Detail & Related papers (2023-09-05T04:39:34Z) - SqueezeLLM: Dense-and-Sparse Quantization [80.32162537942138]
Main bottleneck for generative inference with LLMs is memory bandwidth, rather than compute, for single batch inference.
We introduce SqueezeLLM, a post-training quantization framework that enables lossless compression to ultra-low precisions of up to 3-bit.
Our framework incorporates two novel ideas: (i) sensitivity-based non-uniform quantization, which searches for the optimal bit precision assignment based on second-order information; and (ii) the Dense-and-Sparse decomposition that stores outliers and sensitive weight values in an efficient sparse format.
arXiv Detail & Related papers (2023-06-13T08:57:54Z) - Fully Quantized Image Super-Resolution Networks [81.75002888152159]
We propose a Fully Quantized image Super-Resolution framework (FQSR) to jointly optimize efficiency and accuracy.
We apply our quantization scheme on multiple mainstream super-resolution architectures, including SRResNet, SRGAN and EDSR.
Our FQSR using low bits quantization can achieve on par performance compared with the full-precision counterparts on five benchmark datasets.
arXiv Detail & Related papers (2020-11-29T03:53:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.