MPCFormer: fast, performant and private Transformer inference with MPC
- URL: http://arxiv.org/abs/2211.01452v1
- Date: Wed, 2 Nov 2022 19:43:22 GMT
- Title: MPCFormer: fast, performant and private Transformer inference with MPC
- Authors: Dacheng Li, Rulin Shao, Hongyi Wang, Han Guo, Eric P. Xing, Hao Zhang
- Abstract summary: We design the framework MPCFORMER using secure multi-party computation (MPC) and Knowledge Distillation (KD)
MPCFORMER significantly speeds up Transformer model inference in MPC settings while achieving similar ML performance to the input model.
We show that MPCFORMER remains effective with different trained Transformer weights such as ROBERTABASE and larger models including BERTLarge.
- Score: 64.23599808800738
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Enabling private inference is crucial for many cloud inference services that
are based on Transformer models. However, existing private inference solutions
for Transformers can increase the inference latency by more than 60x or
significantly compromise the quality of inference results. In this paper, we
design the framework MPCFORMER using secure multi-party computation (MPC) and
Knowledge Distillation (KD). It can be used in tandem with many specifically
designed MPC-friendly approximations and trained Transformer models. MPCFORMER
significantly speeds up Transformer model inference in MPC settings while
achieving similar ML performance to the input model. We evaluate MPCFORMER with
various settings in MPC. On the IMDb dataset, we achieve similar performance to
BERTBASE, while being 5.3x faster. On the GLUE benchmark, we achieve 97%
performance of BERTBASE with a 2.2x speedup. We show that MPCFORMER remains
effective with different trained Transformer weights such as ROBERTABASE and
larger models including BERTLarge. In particular, we achieve similar
performance to BERTLARGE, while being 5.93x faster on the IMDb dataset.
Related papers
- Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models [92.36510016591782]
We present a method that is able to distill a pretrained Transformer architecture into alternative architectures such as state space models (SSMs)
Our method, called MOHAWK, is able to distill a Mamba-2 variant based on the Phi-1.5 architecture using only 3B tokens and a hybrid version (Hybrid Phi-Mamba) using 5B tokens.
Despite using less than 1% of the training data typically used to train models from scratch, Phi-Mamba boasts substantially stronger performance compared to all past open-source non-Transformer models.
arXiv Detail & Related papers (2024-08-19T17:48:11Z) - An Empirical Study of Mamba-based Language Models [69.74383762508805]
Selective state-space models (SSMs) like Mamba overcome some shortcomings of Transformers.
We present a direct comparison between 8B-context Mamba, Mamba-2, and Transformer models trained on the same datasets.
We find that the 8B Mamba-2-Hybrid exceeds the 8B Transformer on all 12 standard tasks.
arXiv Detail & Related papers (2024-06-12T05:25:15Z) - Ditto: Quantization-aware Secure Inference of Transformers upon MPC [5.161569981377991]
We propose the framework named Ditto to enable more efficient quantization-aware secure Transformer inference.
We conduct extensive experiments on Bert and GPT2 models to evaluate the performance of Ditto.
The results demonstrate that Ditto is about $3.14sim 4.40times$ faster than MPCFormer and $1.44sim 2.35times$ faster than the state-of-the-art work PUMA.
arXiv Detail & Related papers (2024-05-09T03:28:16Z) - SecFormer: Fast and Accurate Privacy-Preserving Inference for Transformer Models via SMPC [34.63351580241698]
We introduce a comprehensive PPI framework called SecFormer to achieve fast and accurate PPI for Transformer models.
In terms of efficiency, SecFormer is 3.57 and 3.58 times faster than PUMA for BERT$_textBASE$ and BERT$_textLARGE$, demonstrating its effectiveness and speed.
arXiv Detail & Related papers (2024-01-01T15:40:35Z) - MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision
Transformer with Heterogeneous Attention [11.999596399083089]
We propose an MPC-friendly ViT, dubbed MPCViT, to enable accurate yet efficient ViT inference in MPC.
With extensive experiments, we demonstrate that MPCViT achieves 1.9%, 1.3% and 3.6% higher accuracy with 6.2x, 2.9x and 1.9x latency reduction.
arXiv Detail & Related papers (2022-11-25T08:37:17Z) - Efficient Attention-free Video Shift Transformers [56.87581500474093]
This paper tackles the problem of efficient video recognition.
Video transformers have recently dominated the efficiency (top-1 accuracy vs FLOPs) spectrum.
We extend our formulation in the video domain to construct Video Affine-Shift Transformer.
arXiv Detail & Related papers (2022-08-23T17:48:29Z) - Shatter: An Efficient Transformer Encoder with Single-Headed
Self-Attention and Relative Sequence Partitioning [14.164984597158501]
Transformer architecture, based on self-attention, is the foundation of large pretrained models such as BERT.
We present an alternative self-attention architecture, Shatter, that more efficiently encodes sequence information.
We conduct extensive experiments showing that Shatter achieves better performance than BERT.
arXiv Detail & Related papers (2021-08-30T07:42:12Z) - Pay Attention to MLPs [84.54729425918164]
We show that gMLP can perform as well as Transformers in key language and applications.
Our comparisons show that self-attention is not critical for Vision Transformers, as gMLP can achieve the same accuracy.
In general, our experiments show that gMLP can scale as well as Transformers over increased data and compute.
arXiv Detail & Related papers (2021-05-17T17:55:04Z) - Face Transformer for Recognition [67.02323570055894]
We investigate the performance of Transformer models in face recognition.
The models are trained on a large scale face recognition database MS-Celeb-1M.
We demonstrate that Transformer models achieve comparable performance as CNN with similar number of parameters and MACs.
arXiv Detail & Related papers (2021-03-27T03:53:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.