Related papers: East: Efficient and Accurate Secure Transformer Framework for Inference

East: Efficient and Accurate Secure Transformer Framework for Inference

URL: http://arxiv.org/abs/2308.09923v1
Date: Sat, 19 Aug 2023 06:26:14 GMT
Title: East: Efficient and Accurate Secure Transformer Framework for Inference
Authors: Yuanchao Ding, Hua Guo, Yewei Guan, Weixin Liu, Jiarong Huo, Zhenyu Guan, Xiyong Zhang
Abstract summary: We propose a framework emphEast to enable efficient and accurate secure Transformer inference. Compared to Iron, we achieve about 1.8$times$ lower communication within 1.2$times$ lower runtime.
Score: 7.887332345182056
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformer has been successfully used in practical applications, such as ChatGPT, due to its powerful advantages. However, users' input is leaked to the model provider during the service. With people's attention to privacy, privacy-preserving Transformer inference is on the demand of such services. Secure protocols for non-linear functions are crucial in privacy-preserving Transformer inference, which are not well studied. Thus, designing practical secure protocols for non-linear functions is hard but significant to model performance. In this work, we propose a framework \emph{East} to enable efficient and accurate secure Transformer inference. Firstly, we propose a new oblivious piecewise polynomial evaluation algorithm and apply it to the activation functions, which reduces the runtime and communication of GELU by over 1.5$\times$ and 2.5$\times$, compared to prior arts. Secondly, the secure protocols for softmax and layer normalization are carefully designed to faithfully maintain the desired functionality. Thirdly, several optimizations are conducted in detail to enhance the overall efficiency. We applied \emph{East} to BERT and the results show that the inference accuracy remains consistent with the plaintext inference without fine-tuning. Compared to Iron, we achieve about 1.8$\times$ lower communication within 1.2$\times$ lower runtime.

Related papers

Efficient Token Compression for Vision Transformer with Spatial Information Preserved [59.79302182800274]
Token compression is essential for reducing the computational and memory requirements of transformer models. We propose an efficient and hardware-compatible token compression method called Prune and Merge.
arXiv Detail & Related papers (2025-03-30T14:23:18Z)
Centaur: Bridging the Impossible Trinity of Privacy, Efficiency, and Performance in Privacy-Preserving Transformer Inference [36.22164026463692]
Current Privacy-Preserving Transformer Inference (PPTI) frameworks struggle with the "impossible trinity" of privacy, efficiency, and performance. We propose Centaur, a novel hybrid PPTI framework that protects model parameters with random permutations and inference data with SMPC. In terms of performance and efficiency, Centaur not only maintains the same performance as plaintext inference but also improves inference speed by $5.0-30.4$ times.
arXiv Detail & Related papers (2024-12-14T02:50:30Z)
CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction [77.8576094863446]
We propose a new detextbfCoupled dutextbfAl-interactive lineatextbfR atttextbfEntion (CARE) mechanism. We first propose an asymmetrical feature decoupling strategy that asymmetrically decouples the learning process for local inductive bias and long-range dependencies. By adopting a decoupled learning way and fully exploiting complementarity across features, our method can achieve both high efficiency and accuracy.
arXiv Detail & Related papers (2024-11-25T07:56:13Z)
Towards Infinite-Long Prefix in Transformer [18.24137806007111]
We study the ability of Prompting and context-based fine-tuning methods to match the performance of full parameter fine-tuning. We implement an algorithm that only needs to introduce and fine-tune a few extra trainable parameters instead of an infinite-long prefix. Our method achieves superior or competitive performance compared to existing methods like full parameters fine-tuning, P-Tuning V2, and LoRA.
arXiv Detail & Related papers (2024-06-20T06:56:35Z)
Comet: A Communication-efficient and Performant Approximation for Private Transformer Inference [16.328220661765744]
We introduce a novel plug-in method Comet to reduce the communication cost without compromising the inference performance. We evaluate our Comet on Bert and RoBERTa models with GLUE benchmark datasets, showing up to 3.9$times$ less communication and 3.5$times$ speedups.
arXiv Detail & Related papers (2024-05-24T18:43:00Z)
From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers [52.199303258423306]
We propose a novel density loss that encourages higher activation sparsity in pre-trained models. Our proposed method, textbfDEFT, can consistently reduce activation density by up to textbf44.94% on RoBERTa$_mathrmLarge$ and by textbf53.19% (encoder density) and textbf90.60% (decoder density) on Flan-T5$_mathrmXXL$.
arXiv Detail & Related papers (2024-02-02T21:25:46Z)
SecFormer: Towards Fast and Accurate Privacy-Preserving Inference for Large Language Models [34.63351580241698]
We introduce an advanced optimization framework called SecFormer to achieve fast and accurate PPI for Transformer models. In terms of efficiency, SecFormer is 3.56 and 3.58 times faster than Puma for BERT$_textBASE$ and BERT$_textLARGE$, respectively.
arXiv Detail & Related papers (2024-01-01T15:40:35Z)
Secure Transformer Inference Protocol [15.610303095235372]
Security of model parameters and user data is critical for Transformer-based services, such as ChatGPT. Recent strides in secure two-party protocols have successfully addressed security concerns in serving Transformer models, but their adoption is practically infeasible due to the prohibitive cryptographic overheads involved. We present STIP, the first secure Transformer inference protocol without any inference accuracy loss.
arXiv Detail & Related papers (2023-11-14T14:37:23Z)
Exploring the Benefits of Differentially Private Pre-training and Parameter-Efficient Fine-tuning for Table Transformers [56.00476706550681]
Table Transformer (TabTransformer) is a state-of-the-art neural network model, while Differential Privacy (DP) is an essential component to ensure data privacy. In this paper, we explore the benefits of combining these two aspects together in the scenario of transfer learning.
arXiv Detail & Related papers (2023-09-12T19:08:26Z)
HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer Compression [69.36555801766762]
We propose a hardware-aware tensor decomposition framework, dubbed HEAT, that enables efficient exploration of the exponential space of possible decompositions. We experimentally show that our hardware-aware factorized BERT variants reduce the energy-delay product by 5.7x with less than 1.1% accuracy loss.
arXiv Detail & Related papers (2022-11-30T05:31:45Z)
THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption [112.02441503951297]
Privacy-preserving inference of transformer models is on the demand of cloud service users. We introduce $textitTHE-X$, an approximation approach for transformers, which enables privacy-preserving inference of pre-trained models.
arXiv Detail & Related papers (2022-06-01T03:49:18Z)
Provably Efficient Safe Exploration via Primal-Dual Policy Optimization [105.7510838453122]
We study the Safe Reinforcement Learning (SRL) problem using the Constrained Markov Decision Process (CMDP) formulation. We present an provably efficient online policy optimization algorithm for CMDP with safe exploration in the function approximation setting.
arXiv Detail & Related papers (2020-03-01T17:47:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.