Related papers: MPC-Pipe: an Efficient Pipeline Scheme for Secure Multi-party Machine Learning Inference

MPC-Pipe: an Efficient Pipeline Scheme for Secure Multi-party Machine Learning Inference

URL: http://arxiv.org/abs/2209.13643v1
Date: Tue, 27 Sep 2022 19:16:26 GMT
Title: MPC-Pipe: an Efficient Pipeline Scheme for Secure Multi-party Machine Learning Inference
Authors: Yongqin Wang, Rachit Rajat, Murali Annavaram
Abstract summary: Multi-party computing (MPC) has been gaining popularity over the past years as a secure computing model. MPC has fewer overheads than homomorphic encryption (HE) and has a more robust threat model than hardware-based trusted execution environments. MPC protocols still pay substantial performance penalties compared to plaintext when applied to machine learning algorithms.
Score: 3.1853566662905943
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Multi-party computing (MPC) has been gaining popularity over the past years as a secure computing model, particularly for machine learning (ML) inference. Compared with its competitors, MPC has fewer overheads than homomorphic encryption (HE) and has a more robust threat model than hardware-based trusted execution environments (TEE) such as Intel SGX. Despite its apparent advantages, MPC protocols still pay substantial performance penalties compared to plaintext when applied to ML algorithms. The overhead is due to added computation and communication costs. For multiplications that are ubiquitous in ML algorithms, MPC protocols add 32x more computational costs and 1 round of broadcasting among MPC servers. Moreover, ML computations that have trivial costs in plaintext, such as Softmax, ReLU, and other non-linear operations become very expensive due to added communication. Those added overheads make MPC less palatable to deploy in real-time ML inference frameworks, such as speech translation. In this work, we present MPC-Pipe, an MPC pipeline inference technique that uses two ML-specific approaches. 1) inter-linear-layer pipeline and 2) inner layer pipeline. Those two techniques shorten the total inference runtime for machine learning models. Our experiments have shown to reduce ML inference latency by up to 12.6% when model weights are private and 14.48\% when model weights are public, compared to current MPC protocol implementations.

Related papers

HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM Inference [68.59839755875252]
HiRE comprises of two novel components: (i) a compression scheme to cheaply predict top-$k$ rows/columns with high recall, followed by full computation restricted to the predicted subset, and (ii) DA-TOP-$k$: an efficient multi-device approximate top-$k$ operator. We demonstrate that on a one billion parameter model, HiRE applied to both the softmax as well as feedforward layers, achieves almost matching pretraining and downstream accuracy, and speeds up inference latency by $1.47times$ on a single TPUv5e device.
arXiv Detail & Related papers (2024-02-14T18:04:36Z)
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs [53.31402059062365]
BiLLM is a groundbreaking 1-bit post-training quantization scheme tailored for pretrained large language models. It achieves for the first time high-accuracy inference (e.g. 8.41 perplexity on LLaMA2-70B) with only 1.08-bit weights across various LLMs families.
arXiv Detail & Related papers (2024-02-06T09:26:34Z)
CompactTag: Minimizing Computation Overheads in Actively-Secure MPC for Deep Neural Networks [16.39761637882153]
We introduce CompactTag, a lightweight algorithm for generating MAC tags specifically tailored for linear layers in machine learning (ML) applications. CompactTag speeds up this tag computation bottleneck by up to 23x, resulting in up to 1.47x total online phase runtime speedups for various ML workloads.
arXiv Detail & Related papers (2023-11-08T00:18:08Z)
In Situ Framework for Coupling Simulation and Machine Learning with Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations. As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks. This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z)
Efficient Multi-stage Inference on Tabular Data [1.6371451481715193]
Conventional wisdom favors segregating ML code into services queried by product code via RPC APIs. We simplify inference algorithms and embed them into the product code to reduce network communication. By applying our optimization with AutoML to both training and inference, we reduce inference latency by 1.3x, CPU resources by 30%, and network communication between application front-end and ML back-end by about 50%.
arXiv Detail & Related papers (2023-03-21T04:01:55Z)
Multi-Agent Automated Machine Learning [54.14038920246645]
We propose multi-agent automated machine learning (MA2ML) to handle joint optimization of modules in automated machine learning (AutoML) MA2ML explicitly assigns credit to each agent according to its marginal contribution to enhance cooperation among modules, and incorporates off-policy learning to improve search efficiency. Experiments show that MA2ML yields the state-of-the-art top-1 accuracy on ImageNet under constraints of computational cost.
arXiv Detail & Related papers (2022-10-17T13:32:59Z)
A Transferable Approach for Partitioning Machine Learning Models on Multi-Chip-Modules [8.224904698490626]
Multi-Chip-Modules (MCMs) reduce the design and fabrication cost of machine learning accelerators. We present a strategy using a deep reinforcement learning framework to emit a possibly invalid candidate partition that is then corrected by a constraint solver. Our evaluation of a production-scale model, BERT, on real hardware reveals that the partitioning generated using RL policy achieves 6.11% and 5.85% higher throughput.
arXiv Detail & Related papers (2021-12-07T23:40:28Z)
CrypTen: Secure Multi-Party Computation Meets Machine Learning [25.21435023269728]
CrypTen is a software framework that exposes popular secure MPC primitives via abstractions common in modern machine-learning frameworks. This paper describes the design of CrypTen and measure its performance on state-of-the-art models for text classification, speech recognition, and image classification.
arXiv Detail & Related papers (2021-09-02T14:36:55Z)
CPM-2: Large-scale Cost-effective Pre-trained Language Models [71.59893315671997]
We present a suite of cost-effective techniques for the use of PLMs to deal with the efficiency issues of pre-training, fine-tuning, and inference. We introduce knowledge inheritance to accelerate the pre-training process by exploiting existing PLMs instead of training models from scratch. We implement a new inference toolkit, namely InfMoE, for using large-scale PLMs with limited computational resources.
arXiv Detail & Related papers (2021-06-20T15:43:54Z)
Lossless Compression of Efficient Private Local Randomizers [55.657133416044104]
Locally Differentially Private (LDP) Reports are commonly used for collection of statistics and machine learning in the federated setting. In many cases the best known LDP algorithms require sending prohibitively large messages from the client device to the server. This has led to significant efforts on reducing the communication cost of LDP algorithms.
arXiv Detail & Related papers (2021-02-24T07:04:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.