East: Efficient and Accurate Secure Transformer Framework for Inference
- URL: http://arxiv.org/abs/2308.09923v1
- Date: Sat, 19 Aug 2023 06:26:14 GMT
- Title: East: Efficient and Accurate Secure Transformer Framework for Inference
- Authors: Yuanchao Ding, Hua Guo, Yewei Guan, Weixin Liu, Jiarong Huo, Zhenyu
Guan, Xiyong Zhang
- Abstract summary: We propose a framework emphEast to enable efficient and accurate secure Transformer inference.
Compared to Iron, we achieve about 1.8$times$ lower communication within 1.2$times$ lower runtime.
- Score: 7.887332345182056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer has been successfully used in practical applications, such as
ChatGPT, due to its powerful advantages. However, users' input is leaked to the
model provider during the service. With people's attention to privacy,
privacy-preserving Transformer inference is on the demand of such services.
Secure protocols for non-linear functions are crucial in privacy-preserving
Transformer inference, which are not well studied. Thus, designing practical
secure protocols for non-linear functions is hard but significant to model
performance. In this work, we propose a framework \emph{East} to enable
efficient and accurate secure Transformer inference. Firstly, we propose a new
oblivious piecewise polynomial evaluation algorithm and apply it to the
activation functions, which reduces the runtime and communication of GELU by
over 1.5$\times$ and 2.5$\times$, compared to prior arts. Secondly, the secure
protocols for softmax and layer normalization are carefully designed to
faithfully maintain the desired functionality. Thirdly, several optimizations
are conducted in detail to enhance the overall efficiency. We applied
\emph{East} to BERT and the results show that the inference accuracy remains
consistent with the plaintext inference without fine-tuning. Compared to Iron,
we achieve about 1.8$\times$ lower communication within 1.2$\times$ lower
runtime.
Related papers
- CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction [77.8576094863446]
We propose a new detextbfCoupled dutextbfAl-interactive lineatextbfR atttextbfEntion (CARE) mechanism.
We first propose an asymmetrical feature decoupling strategy that asymmetrically decouples the learning process for local inductive bias and long-range dependencies.
By adopting a decoupled learning way and fully exploiting complementarity across features, our method can achieve both high efficiency and accuracy.
arXiv Detail & Related papers (2024-11-25T07:56:13Z) - Towards Infinite-Long Prefix in Transformer [18.24137806007111]
We study the ability of Prompting and context-based fine-tuning methods to match the performance of full parameter fine-tuning.
We implement an algorithm that only needs to introduce and fine-tune a few extra trainable parameters instead of an infinite-long prefix.
Our method achieves superior or competitive performance compared to existing methods like full parameters fine-tuning, P-Tuning V2, and LoRA.
arXiv Detail & Related papers (2024-06-20T06:56:35Z) - Comet: A Communication-efficient and Performant Approximation for Private Transformer Inference [16.328220661765744]
We introduce a novel plug-in method Comet to reduce the communication cost without compromising the inference performance.
We evaluate our Comet on Bert and RoBERTa models with GLUE benchmark datasets, showing up to 3.9$times$ less communication and 3.5$times$ speedups.
arXiv Detail & Related papers (2024-05-24T18:43:00Z) - From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers [52.199303258423306]
We propose a novel density loss that encourages higher activation sparsity in pre-trained models.
Our proposed method, textbfDEFT, can consistently reduce activation density by up to textbf44.94% on RoBERTa$_mathrmLarge$ and by textbf53.19% (encoder density) and textbf90.60% (decoder density) on Flan-T5$_mathrmXXL$.
arXiv Detail & Related papers (2024-02-02T21:25:46Z) - SecFormer: Towards Fast and Accurate Privacy-Preserving Inference for Large Language Models [34.63351580241698]
We introduce an advanced optimization framework called SecFormer to achieve fast and accurate PPI for Transformer models.
In terms of efficiency, SecFormer is 3.56 and 3.58 times faster than Puma for BERT$_textBASE$ and BERT$_textLARGE$, respectively.
arXiv Detail & Related papers (2024-01-01T15:40:35Z) - Secure Transformer Inference Protocol [15.610303095235372]
Security of model parameters and user data is critical for Transformer-based services, such as ChatGPT.
Recent strides in secure two-party protocols have successfully addressed security concerns in serving Transformer models, but their adoption is practically infeasible due to the prohibitive cryptographic overheads involved.
We present STIP, the first secure Transformer inference protocol without any inference accuracy loss.
arXiv Detail & Related papers (2023-11-14T14:37:23Z) - Exploring the Benefits of Differentially Private Pre-training and
Parameter-Efficient Fine-tuning for Table Transformers [56.00476706550681]
Table Transformer (TabTransformer) is a state-of-the-art neural network model, while Differential Privacy (DP) is an essential component to ensure data privacy.
In this paper, we explore the benefits of combining these two aspects together in the scenario of transfer learning.
arXiv Detail & Related papers (2023-09-12T19:08:26Z) - HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer
Compression [69.36555801766762]
We propose a hardware-aware tensor decomposition framework, dubbed HEAT, that enables efficient exploration of the exponential space of possible decompositions.
We experimentally show that our hardware-aware factorized BERT variants reduce the energy-delay product by 5.7x with less than 1.1% accuracy loss.
arXiv Detail & Related papers (2022-11-30T05:31:45Z) - THE-X: Privacy-Preserving Transformer Inference with Homomorphic
Encryption [112.02441503951297]
Privacy-preserving inference of transformer models is on the demand of cloud service users.
We introduce $textitTHE-X$, an approximation approach for transformers, which enables privacy-preserving inference of pre-trained models.
arXiv Detail & Related papers (2022-06-01T03:49:18Z) - Provably Efficient Safe Exploration via Primal-Dual Policy Optimization [105.7510838453122]
We study the Safe Reinforcement Learning (SRL) problem using the Constrained Markov Decision Process (CMDP) formulation.
We present an provably efficient online policy optimization algorithm for CMDP with safe exploration in the function approximation setting.
arXiv Detail & Related papers (2020-03-01T17:47:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.