UFO: Unlocking Ultra-Efficient Quantized Private Inference with Protocol and Algorithm Co-Optimization
- URL: http://arxiv.org/abs/2602.18758v1
- Date: Sat, 21 Feb 2026 08:45:05 GMT
- Title: UFO: Unlocking Ultra-Efficient Quantized Private Inference with Protocol and Algorithm Co-Optimization
- Authors: Wenxuan Zeng, Chao Yang, Tianshi Xu, Bo Zhang, Changrui Ren, Jin Dong, Meng Li,
- Abstract summary: Private convolutional neural network (CNN) inference suffers from high communication and latency overhead.<n>We propose UFO, a quantized 2PC inference framework that jointly optimize the 2PC protocols and quantization algorithm.<n>With extensive experiments, UFO demonstrates 11.7x, 3.6x, and 6.3x communication reduction with 1.29%, 1.16%, and 1.29% higher accuracy compared to state-of-the-art frameworks SiRNN, COINN, and CoPriv, respectively.
- Score: 10.528544664165127
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Private convolutional neural network (CNN) inference based on secure two-party computation (2PC) suffers from high communication and latency overhead, especially from convolution layers. In this paper, we propose UFO, a quantized 2PC inference framework that jointly optimizes the 2PC protocols and quantization algorithm. UFO features a novel 2PC protocol that systematically combines the efficient Winograd convolution algorithm with quantization to improve inference efficiency. However, we observe that naively combining quantization and Winograd convolution faces the following challenges: 1) From the inference perspective, Winograd transformations introduce extensive additions and require frequent bit width conversions to avoid inference overflow, leading to non-negligible communication overhead; 2) From the training perspective, Winograd transformations introduce weight outliers that make quantization-aware training (QAT) difficult, resulting in inferior model accuracy. To address these challenges, we co-optimize both protocol and algorithm. 1) At the protocol level, we propose a series of graph-level optimizations for 2PC inference to minimize the communication. 2) At the algorithm level, we develop a mixed-precision QAT algorithm based on layer sensitivity to optimize model accuracy given communication constraints. To accommodate the outliers, we further introduce a 2PC-friendly bit re-weighting algorithm to increase the representation range without explicitly increasing bit widths. With extensive experiments, UFO demonstrates 11.7x, 3.6x, and 6.3x communication reduction with 1.29%, 1.16%, and 1.29% higher accuracy compared to state-of-the-art frameworks SiRNN, COINN, and CoPriv, respectively.
Related papers
- Stochastic Primal-Dual Double Block-Coordinate for Two-way Partial AUC Maximization [45.99743804547533]
Two-way partial AUCAUC is a critical performance metric for binary classification with imbalanced data.<n>Existing algorithms for TPAUC optimization remain under-explored.<n>We introduce two innovative double-coordinate block-coordinate algorithms for TPAUC optimization.
arXiv Detail & Related papers (2025-05-28T03:55:05Z) - Joint Transmit and Pinching Beamforming for Pinching Antenna Systems (PASS): Optimization-Based or Learning-Based? [89.05848771674773]
A novel antenna system ()-enabled downlink multi-user multiple-input single-output (MISO) framework is proposed.<n>It consists of multiple waveguides, which equip numerous low-cost antennas, named (PAs)<n>The positions of PAs can be reconfigured to both spanning large-scale path and space.
arXiv Detail & Related papers (2025-02-12T18:54:10Z) - PrivQuant: Communication-Efficient Private Inference with Quantized Network/Protocol Co-Optimization [2.9203160719029073]
Existing secure 2PC frameworks suffer from a high inference latency due to enormous communication.
We propose PrivQuant, a framework that jointly optimize the 2PC-based quantized inference protocols and the network quantization algorithm.
We show PrivQuant reduces communication by $11times, 2.5times mathrmand 2.8times$, which results in $8.7times, 1.8times mathrmand 2.4times$ latency reduction compared with SiRNN, COINN, and CoPriv, respectively.
arXiv Detail & Related papers (2024-10-12T13:28:42Z) - EQO: Exploring Ultra-Efficient Private Inference with Winograd-Based Protocol and Quantization Co-Optimization [3.1330492824737055]
Private convolutional neural network (CNN) inference based on secure two-party computation (2PC) suffers from high communication and latency overhead.
We propose EQO, a quantized 2PC inference framework that jointly optimize the CNNs and 2PC protocols.
With extensive experiments, EQO demonstrates 11.7x, 3.6x, and 6.3x communication reduction with 1.29%, 1.16%, and 1.29% higher accuracy compared to state-of-the-art frameworks SiRNN, COINN, and CoPriv, respectively.
arXiv Detail & Related papers (2024-04-15T01:41:18Z) - HEQuant: Marrying Homomorphic Encryption and Quantization for
Communication-Efficient Private Inference [2.498379184732383]
We propose HEQuant, which features low-precision-quantization-aware optimization for the HE-based protocols.
Compared with prior-art HE-based protocols, e.g., CrypTFlow2, Cheetah, Iron, etc, HEQuant achieves $3.5sim 23.4times$ communication reduction.
arXiv Detail & Related papers (2024-01-29T08:59:05Z) - Communication-Efficient Federated Bilevel Optimization with Local and
Global Lower Level Problems [118.00379425831566]
We propose a communication-efficient algorithm, named FedBiOAcc.
We prove that FedBiOAcc-Local converges at the same rate for this type of problems.
Empirical results show superior performance of our algorithms.
arXiv Detail & Related papers (2023-02-13T21:28:53Z) - Optimal Privacy Preserving for Federated Learning in Mobile Edge
Computing [35.57643489979182]
Federated Learning (FL) with quantization and deliberately added noise over wireless networks is a promising approach to preserve user differential privacy (DP)
This article aims to jointly optimize the quantization and Binomial mechanism parameters and communication resources to maximize the convergence rate under the constraints of the wireless network and DP requirement.
arXiv Detail & Related papers (2022-11-14T07:54:14Z) - Task-Oriented Sensing, Computation, and Communication Integration for
Multi-Device Edge AI [108.08079323459822]
This paper studies a new multi-intelligent edge artificial-latency (AI) system, which jointly exploits the AI model split inference and integrated sensing and communication (ISAC)
We measure the inference accuracy by adopting an approximate but tractable metric, namely discriminant gain.
arXiv Detail & Related papers (2022-07-03T06:57:07Z) - Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex
Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network.
We design two optimal algorithms that attain these lower bounds.
We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z) - Fully Quantized Image Super-Resolution Networks [81.75002888152159]
We propose a Fully Quantized image Super-Resolution framework (FQSR) to jointly optimize efficiency and accuracy.
We apply our quantization scheme on multiple mainstream super-resolution architectures, including SRResNet, SRGAN and EDSR.
Our FQSR using low bits quantization can achieve on par performance compared with the full-precision counterparts on five benchmark datasets.
arXiv Detail & Related papers (2020-11-29T03:53:49Z) - Optimal Gradient Quantization Condition for Communication-Efficient
Distributed Training [99.42912552638168]
Communication of gradients is costly for training deep neural networks with multiple devices in computer vision applications.
In this work, we deduce the optimal condition of both the binary and multi-level gradient quantization for textbfANY gradient distribution.
Based on the optimal condition, we develop two novel quantization schemes: biased BinGrad and unbiased ORQ for binary and multi-level gradient quantization respectively.
arXiv Detail & Related papers (2020-02-25T18:28:39Z) - Least squares binary quantization of neural networks [19.818087225770967]
We focus on the binary quantization, in which values are mapped to -1 and 1.
Inspired by the pareto-optimality of 2-bits versus 1-bit quantization, we introduce a novel 2-bits quantization with provably least squares error.
arXiv Detail & Related papers (2020-01-09T00:01:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.