MPC-Pipe: an Efficient Pipeline Scheme for Secure Multi-party Machine
Learning Inference
- URL: http://arxiv.org/abs/2209.13643v1
- Date: Tue, 27 Sep 2022 19:16:26 GMT
- Title: MPC-Pipe: an Efficient Pipeline Scheme for Secure Multi-party Machine
Learning Inference
- Authors: Yongqin Wang, Rachit Rajat, Murali Annavaram
- Abstract summary: Multi-party computing (MPC) has been gaining popularity over the past years as a secure computing model.
MPC has fewer overheads than homomorphic encryption (HE) and has a more robust threat model than hardware-based trusted execution environments.
MPC protocols still pay substantial performance penalties compared to plaintext when applied to machine learning algorithms.
- Score: 3.1853566662905943
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Multi-party computing (MPC) has been gaining popularity over the past years
as a secure computing model, particularly for machine learning (ML) inference.
Compared with its competitors, MPC has fewer overheads than homomorphic
encryption (HE) and has a more robust threat model than hardware-based trusted
execution environments (TEE) such as Intel SGX. Despite its apparent
advantages, MPC protocols still pay substantial performance penalties compared
to plaintext when applied to ML algorithms. The overhead is due to added
computation and communication costs. For multiplications that are ubiquitous in
ML algorithms, MPC protocols add 32x more computational costs and 1 round of
broadcasting among MPC servers. Moreover, ML computations that have trivial
costs in plaintext, such as Softmax, ReLU, and other non-linear operations
become very expensive due to added communication. Those added overheads make
MPC less palatable to deploy in real-time ML inference frameworks, such as
speech translation.
In this work, we present MPC-Pipe, an MPC pipeline inference technique that
uses two ML-specific approaches. 1) inter-linear-layer pipeline and 2) inner
layer pipeline. Those two techniques shorten the total inference runtime for
machine learning models. Our experiments have shown to reduce ML inference
latency by up to 12.6% when model weights are private and 14.48\% when model
weights are public, compared to current MPC protocol implementations.
Related papers
- HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM
Inference [68.59839755875252]
HiRE comprises of two novel components: (i) a compression scheme to cheaply predict top-$k$ rows/columns with high recall, followed by full computation restricted to the predicted subset, and (ii) DA-TOP-$k$: an efficient multi-device approximate top-$k$ operator.
We demonstrate that on a one billion parameter model, HiRE applied to both the softmax as well as feedforward layers, achieves almost matching pretraining and downstream accuracy, and speeds up inference latency by $1.47times$ on a single TPUv5e device.
arXiv Detail & Related papers (2024-02-14T18:04:36Z) - BiLLM: Pushing the Limit of Post-Training Quantization for LLMs [53.31402059062365]
BiLLM is a groundbreaking 1-bit post-training quantization scheme tailored for pretrained large language models.
It achieves for the first time high-accuracy inference (e.g. 8.41 perplexity on LLaMA2-70B) with only 1.08-bit weights across various LLMs families.
arXiv Detail & Related papers (2024-02-06T09:26:34Z) - CompactTag: Minimizing Computation Overheads in Actively-Secure MPC for Deep Neural Networks [16.39761637882153]
We introduce CompactTag, a lightweight algorithm for generating MAC tags specifically tailored for linear layers in machine learning (ML) applications.
CompactTag speeds up this tag computation bottleneck by up to 23x, resulting in up to 1.47x total online phase runtime speedups for various ML workloads.
arXiv Detail & Related papers (2023-11-08T00:18:08Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z) - Efficient Multi-stage Inference on Tabular Data [1.6371451481715193]
Conventional wisdom favors segregating ML code into services queried by product code via RPC APIs.
We simplify inference algorithms and embed them into the product code to reduce network communication.
By applying our optimization with AutoML to both training and inference, we reduce inference latency by 1.3x, CPU resources by 30%, and network communication between application front-end and ML back-end by about 50%.
arXiv Detail & Related papers (2023-03-21T04:01:55Z) - Multi-Agent Automated Machine Learning [54.14038920246645]
We propose multi-agent automated machine learning (MA2ML) to handle joint optimization of modules in automated machine learning (AutoML)
MA2ML explicitly assigns credit to each agent according to its marginal contribution to enhance cooperation among modules, and incorporates off-policy learning to improve search efficiency.
Experiments show that MA2ML yields the state-of-the-art top-1 accuracy on ImageNet under constraints of computational cost.
arXiv Detail & Related papers (2022-10-17T13:32:59Z) - A Transferable Approach for Partitioning Machine Learning Models on
Multi-Chip-Modules [8.224904698490626]
Multi-Chip-Modules (MCMs) reduce the design and fabrication cost of machine learning accelerators.
We present a strategy using a deep reinforcement learning framework to emit a possibly invalid candidate partition that is then corrected by a constraint solver.
Our evaluation of a production-scale model, BERT, on real hardware reveals that the partitioning generated using RL policy achieves 6.11% and 5.85% higher throughput.
arXiv Detail & Related papers (2021-12-07T23:40:28Z) - CrypTen: Secure Multi-Party Computation Meets Machine Learning [25.21435023269728]
CrypTen is a software framework that exposes popular secure MPC primitives via abstractions common in modern machine-learning frameworks.
This paper describes the design of CrypTen and measure its performance on state-of-the-art models for text classification, speech recognition, and image classification.
arXiv Detail & Related papers (2021-09-02T14:36:55Z) - CPM-2: Large-scale Cost-effective Pre-trained Language Models [71.59893315671997]
We present a suite of cost-effective techniques for the use of PLMs to deal with the efficiency issues of pre-training, fine-tuning, and inference.
We introduce knowledge inheritance to accelerate the pre-training process by exploiting existing PLMs instead of training models from scratch.
We implement a new inference toolkit, namely InfMoE, for using large-scale PLMs with limited computational resources.
arXiv Detail & Related papers (2021-06-20T15:43:54Z) - Lossless Compression of Efficient Private Local Randomizers [55.657133416044104]
Locally Differentially Private (LDP) Reports are commonly used for collection of statistics and machine learning in the federated setting.
In many cases the best known LDP algorithms require sending prohibitively large messages from the client device to the server.
This has led to significant efforts on reducing the communication cost of LDP algorithms.
arXiv Detail & Related papers (2021-02-24T07:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.