Related papers: CoPriv: Network/Protocol Co-Optimization for Communication-Efficient Private Inference

CoPriv: Network/Protocol Co-Optimization for Communication-Efficient Private Inference

URL: http://arxiv.org/abs/2311.01737v1
Date: Fri, 3 Nov 2023 06:19:48 GMT
Title: CoPriv: Network/Protocol Co-Optimization for Communication-Efficient Private Inference
Authors: Wenxuan Zeng, Meng Li, Haichuan Yang, Wen-jie Lu, Runsheng Wang, Ru Huang,
Abstract summary: Deep neural network (DNN) inference based on secure 2-party (2PC) can offer cryptographically-secure privacy protection. Previous works heavily rely on a proxy metric of ReLU counts to approximate the communication overhead. We present CoPriv, a framework that jointly optimize the 2PC inference protocol and the DNN architecture.
Score: 13.039573608167077
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural network (DNN) inference based on secure 2-party computation (2PC) can offer cryptographically-secure privacy protection but suffers from orders of magnitude latency overhead due to enormous communication. Previous works heavily rely on a proxy metric of ReLU counts to approximate the communication overhead and focus on reducing the ReLUs to improve the communication efficiency. However, we observe these works achieve limited communication reduction for state-of-the-art (SOTA) 2PC protocols due to the ignorance of other linear and non-linear operations, which now contribute to the majority of communication. In this work, we present CoPriv, a framework that jointly optimizes the 2PC inference protocol and the DNN architecture. CoPriv features a new 2PC protocol for convolution based on Winograd transformation and develops DNN-aware optimization to significantly reduce the inference communication. CoPriv further develops a 2PC-aware network optimization algorithm that is compatible with the proposed protocol and simultaneously reduces the communication for all the linear and non-linear operations. We compare CoPriv with the SOTA 2PC protocol, CrypTFlow2, and demonstrate 2.1x communication reduction for both ResNet-18 and ResNet-32 on CIFAR-100. We also compare CoPriv with SOTA network optimization methods, including SNL, MetaPruning, etc. CoPriv achieves 9.98x and 3.88x online and total communication reduction with a higher accuracy compare to SNL, respectively. CoPriv also achieves 3.87x online communication reduction with more than 3% higher accuracy compared to MetaPruning.

Related papers

MARINA-P: Superior Performance in Non-smooth Federated Optimization with Adaptive Stepsizes [57.24311218570012]
We extend the non-smooth convex theory of EF21-P (Anonymous 2024) and MARINA-P (arXiv:2402.06412) in the non-size convex setting. We provide theoretical guarantees under constant, decreasing, and adaptive (aktypetype) steps.
arXiv Detail & Related papers (2024-12-22T16:18:34Z)
PrivQuant: Communication-Efficient Private Inference with Quantized Network/Protocol Co-Optimization [2.9203160719029073]
Existing secure 2PC frameworks suffer from a high inference latency due to enormous communication. We propose PrivQuant, a framework that jointly optimize the 2PC-based quantized inference protocols and the network quantization algorithm. We show PrivQuant reduces communication by $11times, 2.5times mathrmand 2.8times$, which results in $8.7times, 1.8times mathrmand 2.4times$ latency reduction compared with SiRNN, COINN, and CoPriv, respectively.
arXiv Detail & Related papers (2024-10-12T13:28:42Z)
Hyperdimensional Computing Empowered Federated Foundation Model over Wireless Networks for Metaverse [56.384390765357004]
We propose an integrated federated split learning and hyperdimensional computing framework for emerging foundation models. This novel approach reduces communication costs, computation load, and privacy risks, making it suitable for resource-constrained edge devices in the Metaverse.
arXiv Detail & Related papers (2024-08-26T17:03:14Z)
UniPTS: A Unified Framework for Proficient Post-Training Sparsity [67.16547529992928]
Post-training Sparsity (PTS) is a newly emerged avenue that chases efficient network sparsity with limited data in need. In this paper, we attempt to reconcile this disparity by transposing three cardinal factors that profoundly alter the performance of conventional sparsity into the context of PTS. Our framework, termed UniPTS, is validated to be much superior to existing PTS methods across extensive benchmarks.
arXiv Detail & Related papers (2024-05-29T06:53:18Z)
EQO: Exploring Ultra-Efficient Private Inference with Winograd-Based Protocol and Quantization Co-Optimization [3.1330492824737055]
Private convolutional neural network (CNN) inference based on secure two-party computation (2PC) suffers from high communication and latency overhead. We propose EQO, a quantized 2PC inference framework that jointly optimize the CNNs and 2PC protocols. With extensive experiments, EQO demonstrates 11.7x, 3.6x, and 6.3x communication reduction with 1.29%, 1.16%, and 1.29% higher accuracy compared to state-of-the-art frameworks SiRNN, COINN, and CoPriv, respectively.
arXiv Detail & Related papers (2024-04-15T01:41:18Z)
Communication-Efficient Distributed Learning with Local Immediate Error Compensation [95.6828475028581]
We propose the Local Immediate Error Compensated SGD (LIEC-SGD) optimization algorithm. LIEC-SGD is superior to previous works in either the convergence rate or the communication cost.
arXiv Detail & Related papers (2024-02-19T05:59:09Z)
Improving the Worst-Case Bidirectional Communication Complexity for Nonconvex Distributed Optimization under Function Similarity [92.1840862558718]
We introduce MARINA-P, a novel method for downlink compression, employing a collection of correlated compressors. We show that MARINA-P with permutation compressors can achieve a server-to-worker communication complexity improving with the number of workers. We introduce M3, a method combining MARINA-P with uplink compression and a momentum step, achieving bidirectional compression with provable improvements in total communication complexity as the number of workers increases.
arXiv Detail & Related papers (2024-02-09T13:58:33Z)
HEQuant: Marrying Homomorphic Encryption and Quantization for Communication-Efficient Private Inference [2.498379184732383]
We propose HEQuant, which features low-precision-quantization-aware optimization for the HE-based protocols. Compared with prior-art HE-based protocols, e.g., CrypTFlow2, Cheetah, Iron, etc, HEQuant achieves $3.5sim 23.4times$ communication reduction.
arXiv Detail & Related papers (2024-01-29T08:59:05Z)
High-Throughput Secure Multiparty Computation with an Honest Majority in Various Network Settings [16.242352823823218]
We present novel protocols over rings for secure three-party computation (3PC) and malicious four-party computation (4PC) with one corruption. Our protocols tolerate multiple arbitrarily weak network links between parties without any substantial decrease in performance. They significantly reduce computational complexity by requiring up to half the number of basic instructions per gate compared to related work.
arXiv Detail & Related papers (2022-06-08T09:46:37Z)
1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed [17.953619054149378]
We propose a new communication-efficient algorithm, 1-bit LAMB, which supports adaptive layerwise learning rates even when communication is compressed. For BERT-Large pre-training task with batch sizes from 8K to 64K, our evaluations demonstrate that 1-bit LAMB with NCCL-based backend is able to achieve up to 4.6x communication volume reduction.
arXiv Detail & Related papers (2021-04-13T10:07:49Z)
Deep Learning-based Resource Allocation For Device-to-Device Communication [66.74874646973593]
We propose a framework for the optimization of the resource allocation in multi-channel cellular systems with device-to-device (D2D) communication. A deep learning (DL) framework is proposed, where the optimal resource allocation strategy for arbitrary channel conditions is approximated by deep neural network (DNN) models. Our simulation results confirm that near-optimal performance can be attained with low time, which underlines the real-time capability of the proposed scheme.
arXiv Detail & Related papers (2020-11-25T14:19:23Z)
ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions [76.05981545084738]
We propose several ideas for enhancing a binary network to close its accuracy gap from real-valued networks without incurring any additional computational cost. We first construct a baseline network by modifying and binarizing a compact real-valued network with parameter-free shortcuts. We show that the proposed ReActNet outperforms all the state-of-the-arts by a large margin.
arXiv Detail & Related papers (2020-03-07T02:12:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.